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I. Real Party in interest 

Roche Palo Alto LLC, the assignee of the above referenced patent application is the real 
party in interest. 

II. Related Appeals and Interferences 

There are no related appeals or interferences. 

IIL Status of Claims 

Claim 1 is pending in this application. Claim 1 has been rejected under 35 USC § 101 
and 35 USC § 1 12, first paragraph. The rejection of Claim 1 is appealed. 

IV. Status of Amendments 

No amendments have been filed subsequent to final rejection. 

V. Summary of Claimed Subject Matter 

The presently-claimed invention relates to a polypeptide composition for "(a) novel 
human sodium phosphate co-transporter expressed in intestinal epithelial cells" (stated on 
page 4, lines 9-10 in Substitute Specification filed on May 10, 2002), designated as 
Npt2B. Using the procedures disclosed in Experimental section A (page 28, line 24 to 
page 29, line 21), the Npt2B polypeptide was determined to have the amino acid 
sequence shown in Fig. 1 and identified as SEQ ID NO:l (stated on page 5, line 5). A 
description of the function of Npt2B appears on page 4, lines 19-24 in the Substitute 
Specification: 

"Npt2B is a type II sodium phosphate co-transporter. In its native environment, 
Npt2B is a co-transporter of sodium cation and phosphate anion. Npt2B is 
expressed, among other locations, on the surface of intestinal epithelial cells, i.e. 
on the apical or intestinal luminal side of the epithelial cells, and therefore 
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provides for the transport of sodium and phosphate ions from the intestinal lumen 
into the intestinal epithelial cells." 

The function of the Npt2B polypeptide as a human intestinal sodium phosphate co- 
transporter was derived both from the homology to published sequences for the type II 
intestinal transporters from Xenopus and mouse (page 29, lines 15-21 and references 
cited therein, Ishizuya-Oka et al., Development Genetics 20:53-66, 1997; Hilfiker et al., 
Proc. Natl Acad. Set U.S.A. 95:14564-14569, 1998) and from the expression of the 
Npt2B cDNA in mammalian cells and assaying for sodium-phosphate transporter activity 
as described in Experimental section B (page 29, line 23 to page 30, line 19). To date, 
Npt2B is the sole type II human sodium phosphate co-transporter known that is found in 
the intestine (see Xu et al., Biochim Biophys Acta 1567:97-105, 2002; Werner & Kinne, J 
Physiol Regullntegr Comp Physiol 280(2):R301-312, 2001; both references cited in the 
Amendment and Response submitted by Applicants/ Appellants on February 23, 2004). 

Descriptions of the utilities of the claimed invention appear throughout the specification, 
and are stated, for example, on page 9, lines 28-30 as: "[t]he subject polypeptide and 
nucleic acid compositions find use in a variety of different applications, including 
research, diagnostic, and therapeutic agent screening/discovery/preparation applications, 
as well as in therapeutic compositions and methods employing the same." One specific 
utility is its use in various screening assays designed to identify therapeutic agents. As 
stated in the specification on page 17, lines 9-21 : 

"The subject Npt2B polypeptides find use in various screening assays designed to 
identify therapeutic agents. Thus, one can use a cell model such as a host cell, e.g. 
CHO, HEK293, COS7, Xenopus Oocyte, etc., which has been transfected in a 
manner sufficient to express Npt2B on its surface. One can then contact the cell 
with a medium comprising sodium and phosphate ions, and measure the amount 
of phosphate anions that are internalized in the cell, where measurements are 
taken in both control environments and test environments, e.g. in the presence of a 
candidate Npt2B modulator compound, e.g. an Npt2B agonist or an Npt2B 
antagonist or inhibitor. To assist in detection of Pi uptake, labeled phosphorous is 
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present in the medium, where any convenient label may be employed, such as an 
isotopic label, e.g. as present in 32 P or 33 P. Alternatively, current measurements 
may be taking using well known electrophysiological methods (see e.g. 
Electrophysiology, A practical Approach (IRL Press)(1993)), from which the 
uptake of Pi may be determined. Examples of assays for measuring Pi uptake are 
provided in: Maganin et al, Proc. Nat'l Acad. Sci USA (July 1993) 90: 5979- 
5983; and Helps et al., Eur. J. Biochem. (1995) 228: 927-930." 

Another example of an Npt2B modulatory agent is "antibodies that at least reduce, if not 
inhibit the target Npt2B activity in the host. Suitable antibodies are obtained by 
immunizing a host animal with peptides comprising all or a portion of the target protein, 
e.g. Npt2B [emphasis added]" (Specification at page 21, lines 3-5). 

The significance of identifying Npt2B modulatory agents or compounds, which either 
increase Npt2B activity (i.e. enhances intestinal phosphate absorption), or reduce or 
inhibit Npt2B activity (i.e. stops or limits intestinal phosphate absorption) appears on 
page 27, lines 19-29, as follows: 

"The subject methods [of modulating Npt2B activity] find use in the treatment of 
a variety of different disease conditions involving Npt2B activity. As such, the 
disease conditions treatable according to the subject methods include diseases 
characterized by abnormally high Pi absorption and disease conditions 
characterized by abnormally low Pi absorption. Disease conditions resulting from 
abnormally low Npt2B activity are those characterized by the presence of 
hypophosphatemia, and include: osteomalacia, hypocalciurea, rickets, and the 
like. Disease conditions resulting from abnormally high Npt2B activity are those 
characterized by the presence of hyperphosphatemia and include: 
hyperparathyroidism, hypocalcemia, vitamin D deficiency, soft tissue or 
metastatic calcification, and the like. Of particular interest is the use of the 
subject methods to treat hyperphosphatemia resulting from renal insufficiency, 
e.g. caused by renal disease resulting in at least impaired renal function, and the 
like" 

Therefore, use of the Npt2B polypeptide to identify modulatory agents that increase 
Npt2B activity (i.e. increase intestinal phosphate absorption) would be therapeutically 
desirable for the treatment of diseases resulting from hypophosphatemia. Conversely, 
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use of the Npt2B polypeptide to identify or prepare modulatory agents that decrease 
Npt2B activity (i.e. decrease intestinal phosphate absorption) would be therapeutically 
desirable for the treatment of diseases resulting from hyperphosphatemia. 

VI. Grounds of Rejection to be Reviewed on Appeal 

A. Whether the invention as defined by Claim 1 is patentable under 35 USC § 101 
because it has a specific and substantial utility or a well-established utility; and 

B. Whether the specification enables Claim 1 under 35 USC § 1 12, first paragraph, 
since the invention is supported by a specific and substantial asserted utility or a well- 
established utility. 

VII. Areument 

Al. Utility under 35 USC § 101 

In the Office Action dated November 20, 2003 ("OA 1 1/20/03"), the Examiner rejected 
claim 1 under 35 USC § 101 as allegedly being not supported by either a specific and 
substantial asserted utility or a well-established utility. The Examiner did not discuss the 
credibility of the utility asserted in the specification. The rejection of claim 1 under 35 
USC § 101 was maintained in the Final Office Action dated May 19, 2004 ("OA 
5/19/04"). 

Bl. Standard of Rejection under 35 USC § 101 

To properly reject a claimed invention under 35 USC § 101, the Examiner bears the 
burden of establishing a prima facie showing that the claimed invention lacks patentable 
utility, and needs to provide a sufficient evidentiary basis for factual assumptions relied 
upon in establishing the prima facie showing. If this initial burden is met, the burden 
then shifts to the applicant to provide evidence or argument to rebut the prima facie 
showing. Therefore, rejection under 35 USC § 101 for "lack of utility is a question of 
fact." In re Swartz 232 F.3d 862, 56 USPQ2d 1703 (Fed. Cir. 2000). Cases addressing 
the standard of rejection under 35 USC § 101 include: 
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In re Gaubert, 524 F.2d 1222, 1114, 187 USPQ 664, 666 (CCPA 1975) ("Accordingly, 
the PTO must do more than merely question operability - it must set forth factual reasons 
which would lead one skilled in the art to question the objective truth of the statement of 
operability.") 

In re Oetiker, 977, R2d 1443, 1445, 24 USPQ2d 1443, 1444 (Fed. Cir. 1992) ("[T]he 
examiner bears the initial burden, on review of the prior art or on any other ground, of 
presenting a prima facie case of unpatentability. If that burden is met, the burden of 
coming forward with evidence or argument shifts to the applicant . . . After evidence or 
argument is submitted by the applicant in response, patentability is determined on the 
totality of the record, by a preponderance of evidence with due consideration to 
persuasiveness of argument . . . If examination at the initial stage does not produce a 
prima facie case of unpatentability, then without more the applicant is entitled to grant of 
the patent.") 

In re Brana, 51 F.3d 1560, 1566, 34 USPQ2d 1436, 1441 (Fed. Cir. 1995) ("Only after 
the PTO provides evidence showing that one of ordinary skill in the art would reasonably 
doubt the asserted utility does the burden shift to the applicant to provide rebuttal 
evidence sufficient to convince such a person of the invention's asserted utility.") 

CI. Application of Standard of Rejection under 35 USC § 101 to Claim 1 

Applicants/Appellants submit that the Examiner has not met the burden of presenting a 
prima facie case that the claimed invention lacks patentable utility because he a) failed to 
provide any evidence or factual reasons why one skilled in the art would reasonably 
doubt the asserted utilities of the claimed Npt2B polypeptide and b) misinterpreted the 
facts in the field of the art and concerning factual statements contained in the 
specification. 
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In raising the rejection under 35 USC § 101 in OA 1 1/20/03 and maintaining the rejection 
in OA 5/19/04, the Examiner argued that "(t)he utility of claimed sodium phosphate co- 
transporter cannot be implicated solely from homology to known sodium phosphate co- 
transporter or their protein domains" (OA 1 1/20/03 page 6 lines 1 1-13, OA 5/19/04 page 
4 lines 6-8). To support his position, the Examiner relied upon three generic review 
articles 1 on computational genomics (Bork & Koonin, Nature Genetics 18:313-318, 
1998; Karp, Bioinformatics 14(9): 753-754, 1998; Bork & Eisenberg, Current Opinion in 
Structural Biology 8: 331-332, 1998) and cited excerpts from each paper in OA 11/20/03 
on pages 7-8 as evidence which show that the utility of a protein cannot be implicated 
solely from homology and that such must be true for the Npt2B polypeptide. 



The Examiner's argument is both flawed and contrary to fact for the following reasons. 

First, the main theme from the references cited by the Examiner is not that function 

cannot be predicted solely by homology but rather that function may not necessarily be 

predictable solely from sequence homology. I.e., in some cases, function is successfully 

predicted solely on the basis of homology. See, e.g., Bork & Koonin at [page #]: 

"Prediction of function using comparative sequence analysis is extremely powerful but, 

if not performed appropriately, may lead to the creation and propagation of assignment 

errors" [emphases added]. Second, none of the references cited by the Examiner make 

any mention of sodium phosphate co-transporters. The Examiner did not and could not 

provide an example of a case where a protein predicted by sequence homology to be a 

type II sodium phosphate co-transporter was, in fact, not such a protein or failed to have 

such activity, and Applicants/ Appellants are not aware of any such case. Therefore 

although the Examiner has tried to use his references to argue that, 

"[t]he utility of claimed sodium phosphate co-transporter cannot be implicated 
solely from homology to known sodium phosphate co-transporter or their protein 
domains because the art does not provide teaching stating that all members of 
family of sodium phosphate co-transporter must have the same effects, the same 

1 These articles are "generic" in the sense that they do not discuss Npt2B in any way, nor any other sodium- 
phosphate co-transporter protein. 
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ligands, and be involved in the same disease states, the art discloses evidence to 
-the contrary [emphasis added]." (OA 11/20/03 page 6 lines 11-17, lines 18-23; 
OA 5/19/04 page 4 lines 6-1 1); and 

"[t]herefore, references discussed above disclose the unpredictability of assigning 
a function to a particular protein based on homology, especially one that belongs 
to the family sodium phosphate co-transporter which have very different 
ligand specificity and functions [emphasis added]." (OA 11/20/93 page 8 line 
21 to page 9 line 2), 

his statements have no factual support both from the references he cited and from 
knowledge in the field of sodium phosphate co-transporters. In contrast, there is a 
preponderance of factual evidence which support Applicants/ Appellants' assertion that 
the claimed Npt2B polypeptide has utilities that are specific, substantial, and well- 
established, as presented later in this section CI. 

As previously stated in the Summary of Claimed Subject Matter (Section V) and briefly 
restated here, the specification discloses a specific function for the claimed invention, 
which is a human type II sodium phosphate co-transporter that provides for the transport 
of sodium and phosphate ions from the intestinal lumen into the intestinal epithelial cells 
(page 4, lines 9 and 19-24). It does not require any other ligand for activity. This is a 
unique function for Npt2B since no other type II human intestinal sodium phosphate co- 
transporter have been identified (see Xu et al., Biochim Biophys Acta 1567:97-105, 2002; 
Werner & Kinne, J Physiol Regul Integr Comp Physiol 280(2):R301-3 12, 2001). 
Evidence confirming this function and activity of Npt2B was provided in the Declaration 
under 37 CFR § 1. 132 by Suryananrayana Sankuratri, filed on February 23, 2004 
("Declaration"). The claimed Npt2B protein was not, at the time of filing, an "orphan" 
protein, i.e. a protein that has no known function, as asserted by the Examiner (see, e.g. 
OA 1 1/20/03 at page 6, line 1 and line 8; OA 5/19/04 at page 4, line 2). Therefore, the 
facts do not support the Examiner's statement that "[i]n light of the specification the 
skilled artisan can not come to any conclusions as to the function of claimed sodium 
phosphate co-transporter of SEQ ID NO:l" (OA 1 1/20/03 page 4 lines 13-15). 
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Based on this unique and specific function of Npt2B, the specification discloses several 
specific and substantial utilities for the claimed invention. One such utility for the 
Npt2B polypeptide is its "use in various screening assays designed to identify therapeutic 
agents [that modulate Npt2B activity]." (page 17, lines 9-10). This utility is "specific and 
substantial." It is "specific" in that using Npt2B as a screening target will identify drug 
candidates that specifically modulate Npt2B, rather than other proteins. This utility is 
"substantial" because phosphate must be transported from the intestinal lumen across the 
epithelium in order to be absorbed from the diet, and as Npt2B is the sole known 
intestinal phosphate transporter, modulation of its activity is capable of regulating 
phosphate absorption by the entire body. Thus, screening drug candidates for modulators 
of Npt2B is not merely an empty exercise, nor for academic interest alone, but one that 
has immediate commercial applicability. 2 Further, this utility is credible: the Examiner 
has presented no reason why one of ordinary skill in the art would fail to believe that 
Npt2B is useful as a screening target, nor are Applicants/ Appellants aware of any such 
reason. 

Applicants/ Appellants' claimed protein is also independently useful as an antigen for the 
preparation of antibodies (see specification, page 21, lines 3-5). This utility is also 
specific and substantial. It is "specific" to Npt2B, because the proper generation of 
antibodies that specifically bind Npt2B will result in antibodies that specifically bind only 
to Npt2B, and not to other proteins or antigens. This utility is "substantial" because, inter 
alia, such antibodies can be used as inhibitors of Npt2B directly, and thus are potential 
drug candidates with immediate utility, as discussed above. Such antibodies are also 
useful for other purposes, for example detecting the expression (or absence of expression) 
of Npt2B, for labeling and sorting cells that express Npt2B, and the like. The Examiner 

2 Applicants/Appellants do not here assert that a particular drug is thus enabled: however, Applicants/Appellants 
point out that the right to screen a target has immediate commercial potential, e.g., for licensing to pharmaceutical 
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has presented no reason for doubting the utility of Npt2B used as an antigen to prepare 
specific antibodies, nor are Applicants/Appellants aware of any such reason: thus, this 
utility is also credible. 

Specific diseases that are treatable by modulating Npt2B activity, either by modulatory 
compounds or antibodies are disclosed in the specification on page 27 lines 19-29, and 
were previously stated in the Summary of Claimed Subject Matter. For both utilities, the 
claimed Npt2B polypeptide was able to provide benefit to the public as of the filing date 
by identifying modulatory agents for the treatment of diseases associated with high 
Npt2B activity, e.g. hyperphosphatemia or with low Npt2B activity, e.g. hypophospha- 
temia (for list of diseases, see page 27, lines 19-29). 

Statements in both OA 1 1/20/03 and OA 5/19/04 show that the Examiner is apparently 
under the misapprehension that it is necessary to correlate a disease state with a 
dysfunctional form of the claimed protein (see, e.g., OA 1 1/20/03 at page 5, lines 1-5; 
OA 5/19/04 at page 4, lines 2-4). This, however, is not necessary for the practice of 
Applicants/Appellants 5 invention. Applicants/Appellants instead find that there is utility 
in inhibition or stimulation of the natural, active Npt2B protein, just as many analgesics 
inhibit the normal function of receptors or enzymes involved in the signal chain that 
results in the perception of pain. Because Npt2B is uniquely situated to control 
absorption of phosphate from the diet, modulation of Npt2B activity is capable of 
affecting the amount of phosphate that is absorbed. Thus, in diseases or syndromes that 
are characterized by excessive (or inadequate) levels of phosphate, modulation of Npt2B 
activity will be therapeutic. For example, where renal failure causes hyperphosphatemia, 
an Npt2B inhibitor can reduce the amount of phosphate absorbed, and thus alleviate that 
symptom (see, e.g., Specification at page 27, lines 27-29). No dysfunction in Npt2B 
itself is required: one can treat conditions of too much or too little phosphate that are not 

companies. Consider, for example, the revenues that Chiron Corp. has extracted from other companies for rights to 
screen various HCV proteins for inhibitors. 
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due to a dysfunctional Npt2B, just as one can treat pain or inflammation that are not due 
to a dysfunctional pain receptor. 

As evidence to support the asserted function and utilities of the Npt2B protein, 
Applicants/Appellants submitted the Declaration of Suryanarayana Sankuratri, filed on 
February 23, 2004, which contained figures and data showing functional characteristics 
of the claimed Npt2B polypeptide derived by following the procedure disclosed in the 
specification, Experimental section B (page 29, line 23 to page 30, line 19). The 
Declaration presented data verifying the phosphate transporting activity claimed in the 
specification, which proves that the disclosure in the specification was fully enabled 
when filed. The use of the Declaration in this case is analogous to the use of the Michael 
Kluge declaration inln re Brana, 51 F.3d 1560, 1566, 34 USPQ 2d 1436, 1441 (Fed. Cir. 
1995) in that the data presented was used not to identify or establish a utility but to 
substantiate utility already asserted in the specification. The relevant wording in In re 
Brana is as follows: 

"Enablement, or utility is determined as of the application filing date. In re Glass, 
492 F.2d 1228, 1232, 181 USPQ 31, 34 (CCPA 1974). The Kluge declaration, 
though dated after applicants' filing date, can be used to substantiate any doubts 
as to the asserted utility since this pertains to the accuracy of a statement already 
in the specification. In re Marzocchi, 439 F.2d at 224 n.4, 169 USPQ at 370 n.4. 
It does not render an insufficient disclosure enabling, but instead goes to prove 
that the disclosure was in fact enabling when filed (i.e. demonstrated utility) 
[emphases added]." (In re Brana, 34 USPQ2d at 1441) 

In OA 5/19/04, the Examiner sought to disparage the evidence presented in the 
Declaration by stating "[i]n instant case post filing art cannot be used to establish utility 
because the results of said art were not known at the time of filing of instant application, 
and the information obtained was due to further experimentation [emphasis added]." 
(page 4 lines 18-21) The Examiner's position is contrary both to the facts of the present 
case and to case law. 
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Further evidence on the asserted utilities of human Npt2B in the specification is 
ascertained from the general knowledge in the area of Type II sodium phosphate 
cotransporters, especially the Type lib intestinal transporter at around the time of filing of 
the priority application (February 9, 1999). Hilfiker et al. (Proc. Natl. Acad. Sci. U.S.A. 
95:14564-14569, 1998, cited in the specification on page 2, lines 17-18 and page 29 lines 
20-21), published in November, 1998, described the cloning and characterization of 
mouse Npt2B and was the first paper to classify the Type II sodium phosphate co- 
transporters into the "Type Ha" family, represented by the renal isoforms and the "Type 
lib" family, represented by the intestinal isoforms, which includes Npt2B. Xu et al. 
(Genomics 62:281-284, 1999, cited by the previous Examiner in this case in the Notice of 
Allowance sent on September 11, 2003) describe the cloning and characterization of 
human Npt2B (referred in the paper as Na/Pi-IIb), and was published ten months after 
Applicants/ Appellants' priority date. However, case law has stated that the "court has 
approved use of later publications as evidence of the state of art existing on the filing date 
of an application." In re Hogan and Banks 559 F.2d 595, 605, 194 USPQ 527, 537 
(CCPA 1977). Both papers describe their proteins (which have 78.8% and 99.9% 
sequence identity to the Npt2B polypeptide of Claim 1) as the intestinal or type lib 
sodium phosphate co-transporter. The significance attributed by the authors to their 
findings can be represented by the following statement in the Abstract from Xu et al.: 



"Phosphate plays a crucial role in cellular metabolism and its homeostatic 
regulation in intestinal and renal epithelial is critical. Apically expressed sodium- 
phosphate (Na + -Pi) transporters play a critical role in this regulation. We have 
isolated a cDNA (HGMW-approved symbol SLC34A2) encoding a novel human 
small intestinal Na + -Pi transporter [emphasis added]." 

Therefore, a person of ordinary skill in the art would, at the time of filing, immediately 
appreciate why the claimed Npt2B polypeptide is useful, based on the characteristics of 
the invention. Applicants/ Appellants also submit that based on the evidence presented 
concerning the knowledge possessed by one skilled in the art at the time of filing, the 
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asserted specific and substantial utility for the claimed invention also qualifies as a well- 
established utility. 

Current knowledge in the area of Type lib intestinal sodium phosphate transporters also 
fully contradict the Examiner's assertions that the claimed Npt2B polypeptide has no 
specific or substantial utility. As set forth in the specification at page 29 lines 14-15, 
Applicants/Appellants confirmed by RT-PCR that Npt2B is expressed in the small 
intestine. To date, Npt2B is the sole Type II human sodium phosphate co-transporter 
known that is found in the small intestine (see Xu et al, Biochim Biophys Acta 1567:97- 
105, 2002; Werner & Kinne, J Physiol Regul Integr Comp Physiol 280(2):R301-312, 
2001; both references cited in the Amendment and Response submitted by 
Applicants/Appellants on February 23, 2004). This fact demonstrates the unique 
function of Npt2B and that its utility does not rest solely on homology arguments, as 
asserted by the Examiner throughout both OA 1 1/20/03 and OA 5/19/04. 

Furthermore, a recent article by Peerce et al. (Biochem. Biophys. Res. Commun. 301:8-12, 
2003), cited in the Declaration of Suryanarayna Sankuratri, states that "[a] 
pharmacological method of reducing intestinal phosphate absorption may provide a more 
palatable approach to reducing serum phosphate and may slow the progression of 
moderate chronic renal failure to end-stage renal failure. In the proximal small intestine 
phosphate absorption occurs by a Na + -dependent mechanism ..... [which] occurs through 
the Na~7phosphate cotransporter. The Na + /phosphate cotransporter has been identified as 
a 1 10-120kDa polypeptide (references including Hilfiker et al. and Xu et al.)" This 
statement in Peerce et al. supports and confirms the disclosure in the specification on 
page 27, lines 27-29 which reads, "[o]f particular interest is the use of the subject 
methods [of modulating Npt2B activity] to treat hyperphosphatemia resulting from renal 
insufficiency, e.g. caused by renal disease resulting in at least impaired renal function, 
and the like." 
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In OA 1 1/20/03, the Examiner sought to support the rejection under 35 US C § 101 by 
implying that the disclosure in the present specification was analogous to the situations 
decided by the Courts in Brenner v Manson, 383 U.S. 519, 148 USPQ 689 (1966) (page 
1 1 lines 12-14; page 14 lines 7-12) and in In re Kirk, 376 F.2d 936, 153 USPQ 48 (CCPA 
1967) (page 14 line 21 to page 15 line 7). These analogies are fallacious for the 
following reasons. In Brenner, the applicant failed to disclose any utility for a process to 
synthesize a steroid compound with no known utility, other than as "a possible object of 
scientific inquiry", and offered as evidence only a third party article showing the utility of 
an homologue of the subject steroid compound. In Kirk, the applicants claimed steroid 
compounds said to have valuable "biological properties" and to be of value to the 
furtherance of steroidal research. In contrast, the present specification discloses a 
specific function of the Npt2B polypeptide and its use to identify agents to treat specific 
diseases. Furthermore, the specification asserts both specific and substantial utilities for 
the Npt2B polypeptide that are supported by both the prior art and present knowledge in 
the field. 



In summary, Applicants identified and sequenced Npt2B, determined its biological 
function, and set forth several utilities in the specification, including the use in screening 
assays and the use to generate antibodies. These utilities are unique to Npt2B, as it is the 
only sodium-phosphate co-transporter found in the human intestine, and therefore 
mediates all absorption of phosphate from the diet. Even the Examiner found that the 
sodium-phosphate transporter family was diverse, (see, e.g. OA 1 1/20/03 at page 4, line 
1, and page 11, lines 17-19) and that one would not expect compounds that affect one 
transporter to modulate another. The Declaration of Suryanarayna Sankuratri (Appendix, 

) followed the procedures set forth in the specification and confirmed the screening 

utility set forth therein. The Examiner, in contrast, has provided no reason for doubting 
the asserted utility, and has provided no factual basis for believing that any of 
Applicants/Appellants 5 utilities would not be substantial and specific to the claimed 
invention. 
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Based on the arguments set forth, Applicants/ Appellants submit that under the Standard 
of Rejection under 35 USC § 101, the Examiner has not met the burden of presenting a 
prima facie case that the claimed invention lacks patentable utility by providing evidence 
showing that one of ordinary skill in the art would reasonably doubt the utility asserted in 
the specification. Even if the Board considers that the Examiner has met this initial 
burden, Applicants/ Appellants submit that sufficient rebuttal evidence has been provided 
to convince one of skill in the art of that the asserted utility was specific, substantial, and 
well-established such that the utility requirement under 35 USC § 101 has been satisfied. 

A2. Enablement under 35 USC § 112, first paragraph 

In OA 1 1/20/03, the Examiner rejected claim 1 under 35 USC § 1 12, first paragraph as 
allegedly not enabling one skilled in the art how to use the claimed invention "since the 
invention is not supported by either a specific and substantial asserted utility or a well- 
established utility." The rejection of Claim 1 under 35 USC § 112, first paragraph 
(enablement) was maintained in OA 5/19/04. 

B2. Standard of Rejection under 35 USC § 112, first paragraph (enablement) 

Section 1 12, first paragraph, requires that the specification teach one of ordinary skill in 
the art how to make and use the invention. As stated by the court in In re Marzocchi 
(CCPA 1971) 439 F.2d 220, 169 USPQ 367: 

"[A] specification disclosure which contains a teaching of the manner and process 
of making and using the invention in terms which correspond in scope to those 
used in describing and defining the subject matter sought to be patented must be 
taken as in compliance with the enablement requirement of the first paragraph of 
§ 112 unless there is reason to doubt the objective truth of the statements 
contained therein which must be relied on for enabling support." (439 F.2d at 223, 
169 USPQat369.) 
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C2. Application of Standard of Rejection under 35 USC § 112, first paragraph 
(enablement) to Claim 1 

The Examiner's rejection under 35 USC § 1 12, first paragraph was based solely on the 
utility rejection under 35 USC § 101. Reiterating the arguments set forth in Section VII 
CI above, Applicants/ Appellants submit that the Examiner did not meet the burden of 
presenting a prima facie case that the claimed invention lacks patentable utility by 
providing evidence showing that one of ordinary skill in the art would reasonably doubt 
the utility asserted in the specification. Therefore Claim 1 satisfies the utility requirement 
of 35 USC § 101. 

Applicants/Appellants also assert that Claim 1 satisfies the enablement requirement of 35 
USC § 1 12, first paragraph since the specification would have taught one of ordinary skill 
in the art how to make and use the invention at the time of filing. The amino acid 
sequence of the Npt2B polypeptide, as disclosed in Figure 1 and in SEQ ID NO: 1, 
enabled the artisan to generate antibodies which can modulate the activity of Npt2B to 
treat specific diseases of phosphate metabolism. Methods of preparing such antibodies 
are described in the specification starting from page 21, line 3 to page 23, line 12. Also, 
following the procedures described in the specification in Experimental Section B (page 
29, line 23 to page 30, line 19), would enable the artisan to express the Npt2B 
polypeptide and perform screening assays to identify Npt2B modulatory agents useful for 
the treatment of diseases of phosphate metabolism. Furthermore, the figures and data 
contained in the Declaration of Suryanarayana Sankuratri, which were obtained by 
following the procedures in Experimental Section B and which confirmed both the 
asserted activity and utility of the claimed invention, both demonstrate and prove that the 
specification was enabling at the time of filing. 

D. Conclusion 

The Examiner has failed to establish a prima facie showing that the asserted utility for 
Claim 1 is not specific or substantial. Applicants/Appellants have provided evidence 
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showing that the asserted utility for the present invention was specific, substantial and 
well-established at the time of filing. Applicants/Appellants have also shown that the 
specification was fully enabling at the time of filing. Applicants/ Appellants have 
therefore demonstrated that Claim 1 satisfies the utility requirement of 35 USC § 101 and 
the enablement requirement of 35 USC § 1 12, first paragraph. Accordingly, 
Applicants/Appellants request that the Board of Patent Appeals and Interferences reverse 
the rejection of Claim 1 on the grounds set forth herein. 



Roche Palo Alto LLC 
Patent Law Dept. M/S A2-250 
3431 Hillview Avenue 
Palo Alto, CA 94304 

Direct Phone: (650)855-5316 
Facsimile: (650) 855-5322 
Date: September 23, 2005 



Respectfully submitted, 




David J. Chang, Ph.D. 
Reg. No. 50,374 
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VIII. Claims Appendix 

This Appendix contains the Claim involved in the Appeal: 

1 . An isolated Npt2B polypeptide comprising the amino acid set forth in SEQ ED NO: 1 . 
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IX. Evidence Appendix 

This Appendix contains copies of: 

A. Declaration of Suryanarayana Sankuratri under 37 C.F.R. § 1 . 132 filed on February 23, 
2004. 

B. Hilfiker et al., "Characterization of a murine type II sodium-phosphate cotransporter 
expressed in mammalian small intestine", Proc. Natl. Acad. Sci. U.S.A. 95: 14564-14569, 
1998. 

This reference was entered by Examiner Peter Paras in the 'Information Disclosure 
Citation" in the Notice of Allowance and Fee Due sent on September 11, 2003. 

C. Xu et al., "Molecular cloning, functional characterization, tissue distribution, and 
chromosomal localization of a human, small intestinal sodium-phosphate (Na+-Pi) 
transporter (SLC34A2)" Genomics 62: 281-284, 1999. 

This reference was cited by Examiner Peter Paras in the "Notice of References Cited" in 
the Notice of Allowance and Fee Due sent on September 11, 2003. 

D. Bork & Koonin, "Predicting functions from protein sequences - where are the 
bottlenecks?" Nature Genetics 18: 313-318, 1998. 

This reference was relied upon by the examiner as to grounds of rejection in the Office 
Action of November 20, 2003. 

E. Karp, "What we do not know about sequence analysis and sequence database" 
Bioinformatics 14(9): 753-754, 1998. 

This reference was relied upon by the examiner as to grounds of rejection in the Office 
Action of November 20, 2003. 
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F. Bork & Eisenberg, "Sequences and topology- deriving biological knowledge from 
genomic sequences" Current Opinion in Structural Biology 8:331-332, 1998. 
This reference was relied upon by the examiner as to grounds of rejection in the Office 
Action of November 20, 2003. 
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Address to: 

Assistant Commissioner for Patents 


Group Art Unit 


1646 ! 


Washington, D.C. 20231 


Examiner Name 


Nirmal Singh Basi 




Title 


Human Intestinal Npt2B 



I, Suryanarayana Sankuratri, do hereby declare and state: 

* i. 

1 . I am a biochemist currently employed as Principal Research Scientist at Roche Palo Alto, 
LLC, Palo Alto, California, and am a co-inventor of the claims of the above-identified patent 
application. I directed others and personally performed the research leading to the invention 
disclosed and claimed therein. My professional experience, educational background, 
professional activities, and publications are detailed in the curriculum vitae attached hereto. 

2. I have read the Office Action dated November 20, 2003, in this application and 
understand that the Examiner has rejected pending Claim 1 on the assertion that the claimed 
invention is not supported by either a specific and substantial asserted utility or a well 
established utility. This assertion is incorrect for the reasons set forth below. 

3. Using the procedure that was disclosed in the application (page 29 line 25 to page 30 line 
19), we were able to show that CHO cells which express the human Npt2B protein of this 
invention were able to transport phosphate ions as measured by the amount of radioactive 
phosphate taken up by the cells, whereas CHO cells not expressing Npt2B did not transport 
phosphate. This result, graphically represented in Fig. 1, clearly demonstrated that Npt2B is 
a phosphate transporter. 

4. We then compared the biochemical characteristics of Npt2B with those of the renal 
phosphate transporter, Npt2A. As seen in Fig. 2, both transporters required the presence of 
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sodium in order to transport phosphate. Fig. 3 compares the phosphate uptake kinetics of the 
two transporters, showing that Npt2B demonstrated higher affinities for both sodium and 
phosphate ions than Npt2A. In fact, Km measurements for sodium and phosphate uptake for 
Npt2B were remarkably similar to those obtained from intact intestinal membrane vesicles 
(PeerceBE, Biochim Biophys Acta. 1239: 1-10, 1995). 

5. We also compared the pH dependence of phosphate transport between Npt2A and Npt2B. 
As seen in Figure 4, the two transporters had opposite responses to pH changes, with Npt2B 
showing decreased phosphate uptake as the assay conditions shifted from acidic to alkaline 
whereas Npt2A showed increased phosphate uptake as pH increased. This pH dependence is 
one of the characteristic features of intestinal phosphate transporter characterized by many 
laboratories using intestinal membrane preparations. 

6. We conducted a Northern blot analysis (Fig. 5) which clearly showed that Npt2B was 
expressed in the ileum, jejunum and duodenum but Npt2A could not be detected in these 
areas of the intestine. It is now well-established in the scientific community that Npt2B is the 
protein involved in intestinal sodium-dependent phosphate absorption. It is also well- 
accepted by researchers in the field that complications such as hyperphosphatemia which 
could both cause and be caused by renal disease can be treated by reducing the amount of 
phosphate absorption from the intestine. A recent article by Peerce et al. (Biochem Biophys 
Res Commun. 301: 8-12, 2003, attached herein) states, "A pharmacological method of 
reducing intestinal phosphate absorption may provide a more palatable approach to reducing 
serum phosphate and may slow the progression of moderate chronic renal failure to end-stage 
renal failure." As Npt2B is responsible for most of the phosphate absorbed from the diet, it is 
uniquely situated to control the amount of phosphate that enters the system. Therefore, the 
use of Npt2B in a screening assay to identify inhibitors of the transporter would be of 
significant importance. 

7. Using CHO cells expressing Npt2B, we were successful in identifying a number of 
compounds which were able to inhibit Npt2B activity at low micromolar concentrations (Fig. 
6). These and other newly identified inhibitors of Npt2B could play a significant role in the 
treatment of diseases characterized by hyperphosphatemia. 

8. I hereby declare that all statements made herein of my own knowledge are true and that 
all statements made on information and belief are believed to be true; and further that these 
statements were made with the knowledge that willful false statements and the like so made 
are punishable by fine or imprisonment, or both, under Section 1001 of Title XVIH of the 
United States Code, and that such will false statements may jeopardize the validity of the 
application or any patent issuing thereon. 
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Fig. 1 Heterologous expression of the novel 
phosphate transporter in mammalian cells 
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Fig. 3 Equilibrium constant (Km) for the novel phosphate 
transporter human Npt lib and human renal Npt Ha 
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Fig. 4 pH-dependent activity of novel transporter 
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Fig. 5 Northern Analysis for new transporter expression 
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Fig. 6 Npt lib selective inhibitors 
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Abstract 

Hyperphosphatemia and II 0 hyperparathyroidism are common and severe complications of chronic renal failure. Reduced di- 
etary phosphorus has been shown to be an effective treatment in reducing serum phosphate and serum PTH. 2' -Phosphophloretin 
inhibited small intestine apical membrane Na + /phosphate cotransport and reduced serum phosphate in adult rats. 2'-PP and 
phosphoesters of phloretin were tested for inhibition of human small intestine brush border membrane alkaline phosphatase activity 
and for inhibition of Na + -dependent phosphate uptake. The IC 5 o's for inhibition of alkaline phosphatase suggested an order of 
inhibitory potency of 4-PP > phloretin > 4'-PP > 2'-PP. Inhibition of Na + -dependent phosphate uptake followed the sequence 2'- 
PP»4'-PP > 4-PP > phloretin. These results are consistent with 2'-PP being a specific inhibitor of human intestinal brush border 
membrane Na + /phosphate cotransport. 
© 2002 Elsevier Science (USA). All rights reserved. 

Keywords: Human intestinal brush border membrane; Intestinal phosphate absorption; Na + -dependent phosphate uptake; Na^/phosphate 
cotransporter; 2'-Phosphophloretin 
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In chronic renal failure phosphate retention and de- 
position as calcium phosphate precipitates contribute to 
interstitial injury, renal tubule injury, and cardiac dis- 
ease [1-4]. Very low phosphorus diets in combination 
with phosphate binding compounds have been- shown to 
slow the progression of renal failure. A pharmacological 
method of reducing intestinal phosphate absorption 
may provide a more palatable approach to reducing 
serum phosphate and may slow the progression of 
moderate chronic renal failure to end-stage renal failure. 

In the proximal small intestine phosphate absorption 
occurs by a Na + -dependent mechanism and a Na + - 
independent process. Na + -dependent phosphate uptake 
occurs through the Na + /phosphate cotransporter. The 
Na + /phosphate cotransporter has been identified as a 
1 10-120 kDa polypeptide [5-8]. The mechanism of Na + - 
independent uptake is unknown. 



'Corresponding author. Fax: 1-409-772-3381. 
E-mail address: bpeerce@UTMB.edu (B.E. Peerce). 



A phosphate ester of phloretin has been shown to 
inhibit rat and rabbit intestinal brush border membrane 
vesicle Na + -dependent phosphate uptake [9]. 2'-PP 
inhibition of brush border membrane (BBM) Na + - 
dependent phosphate uptake required Na + and was 
sensitive to external phosphate. In vivo 2'-PP reduced 
plasma phosphate in rats in a concentration-dependent 
manner. We have extended our studies of the effect of 
phosphophloretins to- human BBM alkaline phospha- 
tase activity and phosphate uptake into human BBM 
vesicles. 



Materials and methods 

Materials. Chemicals used in the synthesis of 2'-PP, 4'-PP, and 4-PP 
were purchased from Aldrich Chemical, Milwaukee, Wl. 3-(4- 
hydroxyphenyl)-propionitrile was purchased from Lancaster Chemical, 
Lancaster, PA. All organic solvents were purchased from Aldrich 
Chemical, Milwaukee, WI and were of reagent grade or better. Mem- 
brane filters were purchased from Millipore, Boston, MA. [ 32 P]Phos- 
phate was purchased from DuPont/NEN, Wilmington, DE. Salts and 
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reagents used in the preparation and assay of brush border membrane 
vesicles were purchased from Fisher Chemical, Houston, TX. 

Methods 

Preparation of brush border membrane vesicles. Human intestine 
removed during surgical procedures was scraped and the mucosa was 
stored in 300 mM mannitol and lOmM Hepes/Tris, pH 7.5, at liquid 
N 2 temperatures until needed. Brush border membrane vesicles were 
prepared by Ca 2+ precipitation and differential centrifugation as pre- 
viously described [5,6,10-15]. Purification of brush border membranes 
was assayed using the brush border membrane enzyme markers su- 
crase [16] and alkaline phosphatase [17]. During the course of these 
studies, enrichment in brush border membrane enzymes varied be- 
tween 20- and 28-fold. 

Synthesis of phosphophloretin derivatives. 2 / -Phosphophloretin (2'- 
PP) was synthesized from phloridzin [9]. 2'-PP was analyzed by Mass 
Spectrometry, 3l P NMR i3 CNMR, and 'HNMR [9]. l H NMR 
(400Hz,oVDMSO)5 13.0 (s, IH), 10.7 (br. s, !H),9.2(br. s., 1H), 7.03 
(d, J = 8.6 Hz, 2H), 6.64 (d, J = 8.4 Hz, 2H), 6.63 (dd, J = 1.2, 2.1, 
1H), 6.04 (d, 7 = 2.4 Hz, IH), 3.27 (t, J = 7.2 Hz, 2H), 2.77 (t, 
J = 7.6 Hz, 2H); 3, P NMR 5-4; ESMS mlz 355 (M + H); and melting 
point = 170-171 °C 

4' -Phosphophloretin (4'-PP) was synthesized from 2,6-dihydroxy-4- 
phospho benzene and 4-hydroxy phenyl propionyl nitrile [18]. The 
4'-phosphoester was resolved from the 2'-phosphoester by chroma- 
tography on silica gel using hexanes: dichloromethane: ethyl acetate 
(50:25:25). 2,6-Dihydro-4-phospho benzene was synthesized from 
phloroglucinol and dibenzyl phosphite in acetonitrile and triethyl- 
amine [19]. Prior to reaction with dibenzyl phosphite, phloroglucinol 
was dried at 105°C under vacuum for 7 days. 2,6-Dihydro-4-phospho 
benzene was isolated by column chromatography on Dowex 1 using 
25% methanol to elute the column. 4' -Phosphophloretin was purified 
.by silica gel column chromatography developed with hexanes: di- 
chloromethane: ethyl acetate (60:25:15). 4 / -PhosphophIoretin was an- 
alyzed by NMR and mass spectrometry. l H NMR (750 Hz, d 6 DMSO) 
8 13.5 (s, IH), 9 (br. s, IH), 7.08 (d, J = 8.2 Hz, 2H), 7.06 (d, 
J = 8.2 Hz, 2H), 6.74 (s, 2H), 6.65 (d, J = 8.2 Hz, 2H), 6.62 (d, 
J = 8.2 Hz, 2H), 2.7 (t, J = 7.5 Hz, 5. 1 Hz, 2H), 1 .22 (s, 2H); 31 P NMR 
5-4.8; ESMS mlz 355 (M + H); and melting point 178-179°C 

4-Phosphophloretin (4-PP) was synthesized from 3-(4-dibenzyl 
phosphophenyl) propionyl chloride and phloroglucinol by Friedel- 
Crafts acylation in DMSO with anhydrous A1C1 3 [9,18]. The carbox- 
ylic acid of 3-(4-hydroxy)-cinnamic acid (5g) was reacted with benzyl 
bromide in HMPT (hexamethylphosphoric triamide) for 1 h at 23 °C. 
The benzoate was collected and recrystallized from ethanol. The 
benzoate (5.04 g, 20mmol) was added to 50 ml /i./t-dimethylacetamide 
and cooled to 4°C with stirring, and NaH was added (0.64 g, 
25mmol). The mixture was brought to 23 °C and 10 ml CCU was ad- 
ded. Dibenzyl phosphite (5.6 g, 25.8mmoles) in 25ml n,/i-dimethylac- 
etamide was added and stirring was continued for 1 h at 23 °C. The 
reactants were diluted with 0.2 M acetate buffer, pH 4 (200 ml) and the 
di-benzyl phosphate ester was partitioned between water:hexane:ethyl 
acetate (50:25:25). The di-benzyl phosphate ester was reduced in vol- 
ume, purified by chromatography on a silica gel column eluted with a 
25-50% ethyl acetate gradient in hexanes, and dried at 75 °C under 
vacuum. The benzyl protecting groups were cleaved by catalytic hy- 
drogenation with H 2 gas in ethyl acetate (100 ml) and 200 mg Pd/C for 
24 h. 4-PP was purified as previously described [9]. 3-(4-phosphophe- 
nyl) propionyl chloride was synthesized from 3-(4-hydroxy) cinnamic 
acid and dibenzyl phosphite [19]. 'HNMR (400 Hz, c* DMSO) 5 10.5 
(br. s, IH), 9.2 (br. s, 2H), 7.02 (d, 2H, 7 = 8.2 Hz), 6.8 (d, 2H, 
J = 8.2 Hz), 6.64 (d, 2H, J = 8.4 Hz), 6.6 (dd, J = 2.5, 1.5 Hz, IH), 
6.04 (d, J = 2.5 Hz, 1 H), 3.3 (t, J = 7.2 Hz, 2H), 2.7 (t, J = 7.5 Hz); 31 P 
NMR 5-4.8: ESMS mlz 355 (M + H); and melting point 1 82 °C. 

Phosphorylated phloretin derivatives were analyzed by thin layer 
chromatography using silica gel and methanol: H 2 0 (1:3) as the de- 
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veloping solvent. Spots were identified by UV absorption, I 2 [20] and 
visualized for phosphate esters using Hanes reagent [21]. Phosphoph- 
loretin derivatives were single spots and judged to be 90-94% of the 
UV absorbing material. 

Ata + -dependent brush border membrane vesicle uptakes. Na + - 
gradient driven uptakes of phosphate, alanine, or glucose into in- 
testinal brush border membrane vesicles were performed using a 
rapid mixing rapid filtering device as previously described [5,6,9-1 5]. 
Na + -dependent phosphate uptake into brush border membrane ves- 
icles was performed using 100 uM [ 32 P] phosphate, 100 mM mannitol, 
10 mM Hepes/Tris, pH 7.5, lOOmM NaCI or lOOmM KC1 (uptake 
buffers). Na + -dependent glucose uptake was determined using 
100 uM [ 3 H] glucose, 10 mM Hepes/Tris, pH 7.5, lOOmM mannitol, 
and 100 mM NaCI or 100 mM ECCI. Na + -dependent alanine uptake 
was determined using 100 uM [ 3 H] alanine, 100 mM mannitol, 
10 mM Hepes/Tris, pH 7.5, and 100 mM NaCI or 100 mM KC1. 
Uptakes were performed at 23 °C using 100 ug of brush border 
membrane protein. 

Experiments examining the effect of phosphophloretin derivatives 
on Na + -dependent uptakes were performed as described above using 
lOnM to lOuM phosphophloretin dissolved in lOmM KOH:borate, 
pH 6.5. Phosphophloretin was added to the uptake solution immedi- 
ately prior to addition of protein. In some experiments the effect of 
external phosphate on phosphophloretin inhibition of Na + -dependent 
phosphate uptake was examined. In these experiments, phosphate 
concentration was varied between 25 and 500 uM. The effect of 
phosphate concentration on phosphophloretin inhibition of Na + -de- 
pendent [ 32 P]phosphate uptake into intestinal brush border membrane 
vesicles was analyzed using the non-linear regression program, Enz- 
fitter, Elsevier, Biosoft, Cambridge, UK. 

In some experiments the time course of phosphate uptake into 
human intestinal BBMV was examined. Uptake of phosphate into 
BBMV was determined between 3 s and 30min at 23 °C. Na + -depen- 
dent uptakes were defined as uptake in the presence of NaCI minus 
uptake in the presence of KG. All uptakes were performed in triplicate 
and the results are expressed as means ± SE. 

Measurement of BBM alkaline phosphatase activity. Intestinal BBM 
alkaline phosphatase activity was measured using 1 mM p-nitrophe- 
nylphosphate and lOOug BBM protein as previously described [17]. In 
experiments examining the effect of phosphophloretin derivatives on 
alkaline phosphatase activity the indicated phosphophloretin deriva- 
tive was varied between lOOnM and lOOuM: 



Results 

Effect of phosphophloretins on Na+ -dependent phosphate 
uptake 

The time course of phosphate uptake into human 
intestinal BBMV is shown in Fig. 1 . Phosphate uptakes 
into BBMV in the presence of NaCI (closed circles, solid 
line), in the presence of KC1 (open squares, dashed line), 
and in the presence of NaCI and lOOnM 2'-PP (open 
circles, solid line) are shown. Fig. 1 shows a 7-fold 
overshoot for phosphate uptake over equilibrium 
phosphate uptake in the presence of NaCI. Addition of 
lOOnM 2'-PP resulted in a 75-80% decrease in phos- 
phate uptake without affecting phosphate uptake at 
equilibrium. During the course of these studies, the 
phosphate overshoot of equilibrium phosphate accu- 
mulation varied between 5- and 12-fold (mean = 
7.8-fold, n = 5). 
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Fig. 1. Time course of phosphate uptake into human small intestinal 
BBMV. [ 32 P]phosphate uptake into human small intestine BBMV was 
determined as described in Materials and methods. Phosphate uptakes 
in the presence of NaCl (closed circles, solid line), in the presence of 
KG (open circles, dotted line), and in the presence of NaCl + 100 nM 
of 2'-PP (inverse triangles, dashed line) were determined following 3 s 
to 30 min incubations at 23 °C. Results are means ±SE of triplicate 
determinations and representative of five experiments. 

Fig. 2 shows the effect of 2'-phosphophloretin corir 
centration on Na + -dependent transport into BBM ves- 
icles. 2'-PP inhibited Na + -dependent phosphate uptake 
(solid circles, broken line) in a concentration-dependent 
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[2'-PP jjiM 

Fig. 2. Effect of 2'-PP on Na + -dependent Cotransport Na + dependent 
[ 32 P]phosphate (solid circles, dashed line), Na + -dependent [ 3 H] glucose 
(solid circles, solid line), or Na* -dependent [ 3 H] alanine (open circles, 
solid line) uptakes were determined as described in Materials and 
methods. Results are means ±SE of triplicate determinations and 
representative of three separate experiments. 

manner with an apparent IC50 of 38±6nM (n = 4). 
Na + -dependent glucose uptake (solid circles, solid line) 
and Na + -dependent alanine uptake (open circles, solid 
line) were not affected by 2'-PP at concentrations 10 
times that required for greater than 90% inhibition of 
Na + -dependent phosphate uptake. 

Studies examining the effect of phosphophloretins on 
Na + -dependeht phosphate uptake and alkaline phos- 
phatase "activity are summarized in Table 1. Na + - 
dependent phosphate uptake was insensitive 4-PP and 



Table 1 

Effect of phosphorylated aromatics on Na-dependent phosphate uptake 




O O 
I 



Compound 


Na + -dependent phosphate uptake 




Alkaline phosphatase activity 




IC50 (uM) 


% Change 


ICjo (mM) 


r-pp 

R 2 = R 3 = H 
R, = HPO4 


0.038 ±0.006 


Inhibition 
92 ±4 


1.25 ±0.25 


4'-PP 
R, = R 3 = H 
R 2 = HPO4 


NM 


Inhibition 
15+4 


0.96 ±0.08 


4-PP 
R, = R 2 = H 
R 3 = HPO4 
Phloretin 


0.185-1-0.02 
NM 


Stimulation 
38 ±12 


0.350 ± 0.08 
0.692 ± 0.058 



NM, not measurable. Results are means ± SE of triplicate determinations and three separate experiments. /7-nitrophenyl phosphate concentration 
was 1 mM. 
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Fig. 3. Effect of [Phosphate] on 2'-PP Inhibition of Na + -dependent 
phosphate uptake Na* -dependent [ 32 P]phosphate uptake into intesti- 
nal brush border membrane vesicles was determined as described in 
Materials and methods. External phosphate concentration was 50 uM 
(open circles), 100 uM (closed circles), or 250 uM (solid triangles). 
2'-PP concentration was varied between 10 nM and 1 uM. Results are 
means ±SE of triplicate determinations and representative of three 
experiments. 

phloretin at concentrations below 100 jiM. Addition of 
4'-PP resulted in a 15% ±4% (n = 3) inhibition of Na + - 
dependent phosphate uptake at 500 nM 4'-PP. In con- 
trast, Na + -dependent phosphate uptake was inhibited 
more than 90% at 2'-PP concentrations above 100 nM. 
All of the phloretin derivatives examined were weak in- 
hibitors of intestinal BBM alkaline phosphatase activity. 

Effect of external phosphate concentration on 2'-PP 
inhibition of Na+ -dependent phosphate uptake 

The effect of external phosphate concentration on 2'- 
PP inhibition of Na + -dependent phosphate uptake is 
shown in Fig. 3. Fig. 3 is a Dixon plot of the effect of 
50p.M phosphate (solid circles), 100 |iM phosphate 
(open circles), and 250 |iM phosphate (solid triangles) on 
2'-PP inhibition of Na + -dependent phosphate uptake. 
Increasing the external phosphate concentration de- 
creased 2'-PP inhibition of Na + -dependent phosphate 
uptake. The effect of phosphate concentration on 2'-PP 
inhibition of brush border Na + -dependent phosphate 
uptake was analyzed by the method of Cornish-Bowden 
[22] at 50, 100, and 250 \M of 2'-PP. The intercept of the 
three straight lines was above the A^axis and to the right 
of the 7-axis, which is consistent with mixed inhibition 
by 2'-PP [23]. 

Discussion 

2'-PP inhibition of Na + -dependent phosphate uptake 
was measured in proximal small intestine brush border 
membrane vesicles isolated from human small intestine. 



The apparent IC 50 for 2'-PP inhibition of Na + -depen- 
dent phosphate uptake was 38±8nM (Fig. 2). The 
apparent IC 50 for 2'-PP inhibition of human intestinal 
BBMV Na + -dependent phosphate uptake was similar to 
that reported for rabbit intestinal brush border mem- 
brane vesicles and rat intestinal brush border membrane 
vesicles [11]. 

The effect of 2'-PP was specific for the Na + /phosphate 
cotransporter and specific for 2'-PP. Na + -independent 
phosphate uptake, Na + -dependent glucose uptake, and 
Na + -dependent alanine uptake were not affected by 2'- 
PP addition to the uptake media (Fig. 2). 4'-PP and 4-PP 
did not alter Na + -dependent phosphate uptake into 
human intestinal brush border membrane vesicles, in- 
dicating that the effect of phosphophloretin on the Na" 1 "/ 
phosphate cotransporter was specific for the 2 / -isomer 
(Table 1). 

Table 1 indicates that BBMV esterase activity did not 
contribute to phosphophloretin inhibition of Na + - de- 
pendent phosphate uptake. Although the phosphoph- 
loretin derivatives were inhibitors of intestinal BBMV 
alkaline phosphatase activity, the order of inhibitor 
potency for phosphophloretin inhibition of alkaline 
phosphatase was different from the order of inhibitor 
potency for phosphophloretin inhibition of Na + - depen- 
dent phosphate uptake. Phosphophloretin inhibition of 
alkaline phosphatase hydrolysis of p-nitrophenyl phos- 
phate followed the sequence: 4-PP > 4'-PP > 2'-PP > 
phloretin (Table 1). Phosphophloretin inhibition of 
Na + -dependent phosphate uptake followed the se- 
quence: 2'-_PP»4'-PP > phloretin > 4-PP (Table 1). 
The effect of phloretin on human proximal small intes- 
tine brush border membrane Na + -dependent phosphate 
uptake was similar to the effect of phloretin on Na + - 
dependent phosphate uptake into K562 cells and human 
erythrocytes [24], and into rabbit proximal small intes- 
tine brush border membrane vesicles [9], 

The effect of external phosphate on 2'-PP inhibition 
of Na + -dependent phosphate uptake appeared to be 
competitive. Increasing external phosphate decreased 2'- 
PP inhibition of Na + -dependent phosphate uptake (Fig. 
3). Further examination of the results in Fig. 3 by a plot 
of the slope from Fig. 3 versus the reciprocal of the 
phosphate concentration and the method of Cornish- 
Bowden [22,23] indicated mixed inhibition (K max and K M 
inhibited). 

The effect of 2'-PP on the V max for Na + -dependent 
phosphate uptake may be due to the off rate of 2'-PP 
from the Na + -loaded cotransporter. If the 2'-PP off rate 
is much slower than the 3 seconds used for measure- 
ments of Na + -dependent phosphate uptake, the co- 
transporter: Na + :2'-PP complex would effectively be a 
dead-end complex. The resultant removal of a signifi- 
cant percentage of the cotransporter as a dead-end 
complex would result in a decrease in the apparent 
transport velocity and decreased P^ a x- 
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ABSTRACT An isoform of the mammalian renal type II 
Na/Pj-cotransporter is described. Homology of this isoform to 
described mammalian and nonmammalian type II cotrans- 
porters is between 57 and 75%. Based on major diversities at 
the C terminus, the new isoform is designed as type lib 
Na/Pi-cotransporter. Na/Pj-cotransport mediated by the type 
lib cotransporter was studied in oocytes of Xenopus laevis. The 
results indicate that type lib Na/Pj-cotransport is electro- 
genic and in contrast to the renal type II isoform of opposite 
pH dependence. Expression of type lib mRNA was detected in 
various tissues, including small intestine. The type lib protein 
was detected as a 108-kDa protein by Western blots using 
isolated small intestinal brush border membranes and by 
immunohistochemistry was localized at the luminal mem- 
brane of mouse enterocytes. Expression of the type lib protein 
in the brush borders of enterocytes and transport character- 
istics suggest that the described type lib Na/Pj-cotransporter 
represents a candidate for small intestinal apical Na/Pj- 
cot ran sport. 



The kidney and the small intestine are important (externally 
oriented) control sites to maintain and balance the extracel- 
lular concentration of inorganic phosphate (Pi). In the kidney, 
reabsorption of filtered Pi occurs in the proximal tubule via 
apically located Na/Pj-cotransporters. Two dissimilar Na/Pj- 
cotransporters, named type I and type II, have been identified 
and have been shown to be expressed in the apical membrane 
of proximal tubular cells (1). As demonstrated recently by 
targeted inactivation, the type II Na/Pj-cotransporter repre- 
sents the major pathway by which Pj is reabsorbed (1, 2). With 
the exception of osteoclasts (3), expression of the type II 
cotransporter has not yet been described other than in prox- 
imal tubules. 

In addition to the well characterized renal handling of P i} an 
understanding of whole body Pj-homeostasis necessitates elu- 
cidating the entry step of P; in the small intestine (apical 
Na/Pj-cotransport). However, until now, the molecular iden- 
tity of a mammalian small intestinal apical Na/Pj-cotrans- 
porter has not been described. Although expression of type III 
Na/Pj-cotransporter mRNA (retroviral receptors Glvr-1 and 
Ram-1; ref. 4) has been reported in small intestine, as yet, there 
is no evidence that these Na/Pj-cotransporters are expressed 
in the apical membrane. Rather, it seems that type III co- 
transporters are expressed ubiquitously in epithelial and non- 
epithelial cells. 

Based on an expressed sequence tag (EST) clone derived 
from a cDNA library of murine embryonic cells, we have 
obtained a functional full length clone coding for a mammalian 
isoform of the renal type II Na/Pj-cotransporter, which was 
named type lib Na/Pi-cotransporter. Expression of type lib 
mRNA was found in a variety of tissues, including small 
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intestinal mucosa. By immunohistochemistry, expression of 
the type lib protein was localized at the brush border mem- 
brane of enterocytes. Transport characteristics of type Ilb- 
mediated Na/Pj-cotransport were similar to the ones described 
for small intestinal Na/Pj-cotransport (5, 6). Our data suggest 
that the described type lib Na/Pj-cotransporter may represent 
the (a) small intestinal apical Na/Pj-cotransporter. 

MATERIALS AND METHODS 

Sequencing and Rapid Amplification of 5'-cDNA Ends. An 

EST-clone (Genome Systems, St. Louis; clone AA647858) 
with an insert of 3.4 kilobases (kb) was sequenced on both 
strands. Sequence comparison with the mouse renal type II 
Na/Pj-cotransporter (7, 8) suggested that ~700 bp were miss- 
ing at the 5' end. To obtain the full length cDNA rapid 
amplification of 5 ' cDNA ends was performed as follows: Total 
RNA (10 ixg) from mouse small intestinal mucosa was retro- 
transcribed with 200 units MMLV-RT (GIBCO/BRL) by 
using an oligo-dT 8 primer. Extension of the 5' end of the cDNA 
was performed by polynucleotide transferase (30 units, 
GIBCO/BRL) in the presence of 0.4 mM dATP. PCR was 
performed with a specific antisense primer derived from the 
EST-sequence and a Tn primer containing a Sail adapter. A 
second round of PCR was performed with a nested antisense 
and a Sal/-specific primer. The final PCR product was digested 
with Sail and Sau3a and was subcloned into pBluescript SK(+) 
(Stratagene). The same extension products were obtained by 
two independent rounds of reactions. 

Construction of a Full Length cDNA. Total RNA (10 /xg) of 
mouse small intestine was retrotranscribed by using a dTs 
primer. A PCR fragment was amplified by using a sense primer 
(nucleotides 8-28) and an antisense primer (nucleotides 961- 
980) and was cloned into the pGEM-T vector (GIBCO/BRL). 
The fragment corresponding to the 5' end of the transporter 
was excised with Bglll and Notl. To obtain the missing 3.2 kb 
of the transporter, the EST-clone was digested with Bglll and 
Sail. Both parts were ligated into pSPORTl (GIBCO/BRL), 
which was digested with Notl and Sail. 

Reverse Transcription (RT)-PCR Analysis. Total RNA (10 
/ig) isolated from different tissues was retrotranscribed with 
200 units of reverse transcriptase (MMLV, GIBCO/BRL) by 
using a dTg primer. PCR was performed with a sense (nucle- 
otides 620-638) and an antisense (nucleotides 887-906) 
primer derived from the EST-clone. Amplification of the 
mouse kidney-specific type II Na/Pj-cotransporter was 
achieved by a primer pair derived from the mouse NaPj-7 
cDNA (7, 8): sense position, 10-29; antisense position, 195- 
210. 



Abbreviation: EST, expressed sequence tag; kb, kilobase; RT, reverse 
transcription. 

Data deposition: The sequence reported in this paper has been 
deposited in the GenBank database (accession no. AF081499). 
*To whom reprint requests should be addressed at: Institute of 
Physiology, University Zurich, Winterthurerstrasse 190, CH-8057 
Zurich, Switzerland, e-mail: Biber@Physiol.Unizh.CH. 
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type lib 
bovine (I 75%) 
flounder ( I 64%) 
X. laevis ( I 63%) 
type Il/mouse (I 57%) 



\PW PELENAQPN PGK FI EGASG PQS S I PAKDKE ASKTN DNGT PVAKT E LLPS YgALVjl I 
&PWPELENSQPTSEKYTVKADGEQSAKPEKAKETEK . DDTGTPITKIELVPSH : TAtHi 

APRQKVGTNSSPKPALDDDA PV GN IPPAY jj TLDgV 

SPPFPEIDNHGFNTGDYVDDSKPVWSTGINPV PNDGTPSDPEKELSPTY s TLSgC 

ylSYSERLGGPAVSPLPVRGRHMVHGATFAYVPSPOVLHR. . IPGTSTYAISSLgPVTftr 



61 eehpegtd. . pwd: 
60 eepteved. . pwd! 
37 sddpdad . . . pwna 

56 KETPEPEEVKPWDM 
59 EHSCPCGEVLECH! 



LPT* 



1QDTGIRWS 
JKDTGLKWS 
llDNGVKWS 
JKSTGPKW 
AQEEEQKP 



>DTKG 
LDTKG 

JLDTKG 
ITTKC 

iPRLSC 



LC|FQGJ 
ILCjgFQGf 
MMR§LTG| 
ILS|LLg| 




gPIIMGSNIGTS^TNTIVA^lQgGDRtjqFRRAFA 





VT . EMNVTVPS' 
VT . ERNVTVPSP] 
'VTF.WNATV 
LVNISLPSAI 



SgiUSGQV 

s|l|gqv 
sBlHgqv 

Sqi^GQV 




SPSYCWT.DGIQJJ 
ISPSLCWT . DGLYTV 
>GALCW . EEGN LTV 
ISA S LCWT DDNN VTV 
SMSRVEAIGSLANTl 



IQNVTQKENIA 
IKNVTYKENIfl 

ilnktwiinqe 
egfetikv 
eISSn] 





QIALCHFFFNggGI^|YP|PgiR|PIRyA^LG 
QTALCHFFn^BGIp^YpfflpHRHpiR|p^LG 
Q I AL CH FF FNfcjG I p^Y P| F|BRgP I RpAf|LG 
Q I A LCH Ffc'FNlgGlfe^Y p| F[|5r|p I RmfM LG 




614 FQRR CCCC CRVCCRVCCMVCGC . KCCRCSKCCRDQGEEEEEKEQDIPVKASGAFD|AA. MSKECQDE . GKGQVEVLSMKA 

613 FQMRCCCCCRVCCRLCCGLCGCSKCCRCTKCSEDLEEGKDE PVKSPEAFNSLA.MDKEAQDGVTKSEVDASGTKI 

586 CGKY CCCC CKCCK KTE . . DENMMKNNTKSLEMYDfflPSMLKDEDTKEASKA 

607 CKQF CCCC CGKKC . . KGCKCCKCCHDKED . . EECDIETKPQALEWHDfijVIDLSDEIKKPESDEQQNSQNL . . 

598 YARPEPRS PQLPPRVF LEELPPATPSPRLALPAHH^A 



691 LSNTyVF 
687 VSSvflAL 

634 0HL 

673 ySF 

635 JjRL 

Fig. 1. Amino acid sequence comparisons. Sequences of type II Na/Pi-cotransporters were aligned by pileup (sequence analysis software 
package; Genetics Computer Group, Madison, WI). Shaded boxes indicate consensus residues in all species listed. Additionally, equal residues 
among the type lib, bovine, flounder, and X. laevis isoforms are underlined. Predicted transmembrane regions are indicated by bars, and potential 
W-glycosylation sites of the proposed extracellular loop are indicated by an asterisk. Numbers given in parentheses (x%) indicate the percentage 
of overall homologies to the type lib sequence. 



Northern Blots. Total RNA from mouse kidney cortex and 
upper small intestinal mucosa was isolated by the cesium 
trifluoroacetate/guanidinium thiocyanate method. Poly(A) + 
RNA was obtained by oligo-dT cellulose chromatography. 
Poly(A) + RNA (5 jxg) was separated on agarose gels (1.2%) 
and was transferred onto nylon membranes (BioDyn). Blots 
were hybridized with the following probes obtained by random 
priming in the presence of [a 32 P]dCTP: (/) A 5' end fragment 



of 900 bp of the type lib cDNA was obtained by restriction with 
Not\ and Bglll, and 9ii) a full length probe of the type Ha 
(NaPr7; refs. 7 and 8) cDNA was obtained by restriction with 
Notl and Sail. Equal loadings were confirmed by using probes 
specific for the ribosomal protein L 28 (9). Hybridization was 
performed in 6X standard saline citrate (SSC), 5X Denhardt's, 
0.5% SDS, and herring sperm DNA (100 /xg/ml) at 65°C. Blots 
were washed sequentially with 2X SSC/0.1% SDS (room 
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287 bp^ 




mm 



Fig. 2. RT-PCR analysis using primers specific for the type lib (A) or the renal type II (NaPj-7) (B) Na/Pi-cotransporter. All reactions were 
performed in the presence or absence of reverse transcriptase (RT; +, -). Integrity of the RNA preparations was confirmed by Northern blots 
using probes specific for /3-actin (not shown). 



temperature, 10 min), lx SSC/0.1% SDS (10 min at 40°C), 
and 0.5 X SSC/0.1% SDS (20 min at 55°C). After exposure, 
results were analyzed by the software package image quant 
(Molecular Dynamics). 

Immunodetections. Rabbit polyclonal antibodies were 
raised against a synthetic peptide close to the C terminus. 
Mouse small intestinal brush border membranes were isolated 
by a Mg 2+ -precipitation technique (10), and Western blots 
were performed as described (11). For gel electrophoresis, 
membranes were denatured in 2% SDS without heating. For 
immunohistochemistry, mouse duodenum was rinsed with 
0.9% NaCl and was fixed by immersion in 3% paraformalde- 
hyde, 0.05% picric acid in a 6:4 mixture of 0,1 M cacodylate 
buffer (pH 7.4). All other steps were performed as described 

(11) . A swine anti-rabbit IgG conjugated to fluorescein iso- 
thiocyanate (Dakopatts, Glostrup, Denmark) was used as a 
secondary antibody. For peptide protections, the antigenic 
peptide was added at concentrations of 100 /xg/ml. 

Transport Assays in Oocytes of Xenopus laevis. Isolation and 
handling of X. laevis oocytes has been described elsewhere 

(12) . Oocytes were injected with 5 ng cRNA (in 50nl water). 
Transport was measured 3 days later, either by isotope flux as 
described (12, 13, 18) or by electrophysiological means under 
steady state, voltage clamped conditions (14). 

RESULTS 

By databank search, an EST cDNA clone (AA647858) pre- 
pared from mouse two-cell stage embryos was found that 
showed 73% homology over a length of 312 bp to the mouse 
renal type II Na/Pj-cotransporter (7, 8). Preliminary analysis 
by Northern blotting indicated that a related mRNA species is 
expressed in small intestinal mucosa (data not shown). Full 
length sequencing of the EST clone (3.5 kb) suggested that, at 



the 5' end, 700 bp were missing. By rapid amplification of 5' 
cDNA ends, a full length cDNA (4,039 bp) was obtained 
(GenBank accession no. AF081499) containing an ORF (po- 
sitions 45-2,137) coding for a protein of 697 amino acids (Fig. 
1). 

Amino acid comparisons revealed that the newly identified 
protein is 57-75% homologous to Na/Pj-cotransporters iden- 
tified in bovine NBL cells (15), flounder kidney and intestine 
(16), and intestine and lung of X. laevis (17) and to the renal 
type II Na/Pj-cotransporter (NaP r 7; refs. 7 and 8) (Fig. 1). 
Overall homology to type I (1) and type III Na/Pj- 
cotransporters (4) was ~20%. As illustrated, highest homol- 
ogies among the listed Na/Pj-cotransporters are seen in re- 
gions that also have been proposed to represent transmem- 
brane regions (1). The most striking difference of the newly 
identified protein compared with the mouse renal type II 
Na/Pj-cotransporter is found in the C-terminal region con- 
taining clusters of cysteine residues. A similar clustering of 
cysteine residues is also present in the Na/Pj-cotransporters of 
bovine cells, flounder kidney/ intestine, and Xenopus intestine. 
Therefore, we propose to subdivide type II Na/Pj-cotransport- 
ers into a subfamily type Ha (represented by the renal isoforms 
of mouse, rat, rabbit, opossum kidney cells, and human; ref. 1) 
and type lib (represented by the isoforms of bovine, flounder, 
and Xenopus and the one described here). 

Expression of type lib mRNA was analyzed by RT-PCR 
using total RNA (Fig. 2) and Northern blots using 
poly(A) + RNA (Fig. 3). By RT-PCR using primers positioned 
within the ORF, expression of type lib mRNA was indicated 
in the mucosa of the upper small intestine, colon, liver, lung, 
kidney, and testis. As a control, the same RNA samples were 
subjected to RT-PCR analysis for the renal type Ila cotrans- 
porter NaPj-7. As indicated, a type Ila-related PCR product 
was found only in RNA isolated from kidney, confirming the 
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Fig. 3. Northern blot analysis of poly(A)+RNA isolated from 
mouse upper small intestinal mucosa and kidney cortex. Blots were 
hybridized with probes derived from a 900-bp 5' end fragment of type 
lib cDNA or from the full length cDNA of the mouse renal type II 
Na/Pj-cotransporter. Hybridization to the ribosomal protein L28 
mRNA was used to confirm equal loadings. In the case of the NaPj-7 
probe, five times less poly(A)+RNA of kidney cortex was loaded. 

kidney specific expression of the type Ila cotransporter (1, 18). 
Northern blots performed with poly(A) + RNA isolated of 
mouse kidney cortex and small intestine are shown in Fig. 3. 
By using a 5 '-end probe of 900 bp, the major mRNA species 
detected in small intestinal mRNA was at «*4 kb. In mRNA of 
kidney cortex, no such signal was detectable. In addition, in 
small intestinal mRNA, a faint signal at **2.5 kb was evident. 



Fig. 4. Immunodetection of the type lib Na/Pj-cotransporter. (A ) 
Western blots of isolated mouse small intestinal brush border mem- 
branes (35 /xg protein per lane). CB, Coomassie blue staining. Incu- 
bation with the first antibody was performed in the absence (lane 1) 
or presence (lane 2) of the antigenic peptide. (B) Immunofluorescence 
detection of the type lib cotransporter in the apical membrane of 
enterocytes. Incubation with the primary antibody was performed in 
the absence (Upper) or presence (Lower) of antigenic peptide (100 
/xg/ml). 

By using the same probe, two signals at ^2.5 kb also were 
detected with poly(A)+ RNA of mouse kidney cortex. To verify 
a possible crossreaction with the renal type II cotransporter, 
the same blots were hybridized with probes derived from the 
NaPj-7 cDNA (Fig. 3B). As illustrated, no signals with small 
intestinal mRNA were detected by this probe, but, with mouse 
kidney cortex mRNA, a strong signal (double band at «-2.5 kb) 
was observed representing the renal type II Na/Pj- 
cotransporter. This suggested that the double band seen in 
kidney mRNA with the type lib probe represents a crossre- 
action with the type II (NaPj-7) cotransporter. Confirmation 
for such a crossreaction was obtained with poly(A) + RNA 
isolated from kidney cortex of mice in which the renal type II 
Na/Pj-cotransporter has been knocked out (19). 

Expression of the type lib protein was analyzed by immu- 
noblotting and immunofluorescence using a polyclonal anti- 
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Fig. 5. Characterization of Na/Pj-cotransport in oocytes injected with type lib cRNA. (A) Isotope flux measurements performed at 0.5 mM 
phosphate (1, 2, 3) or 1 mM sulfate (4) (mean ± SD of 8-10 oocytes; two experiments). Bar 1: Pi-uptake in the presence of NaCl into oocytes 
injected with water; bars 2 and 3: Pi-uptake into oocytes injected with type lib cRNA in the presence of NaCI (2) or choline-Cl (3). (£) Inwardly 
directed currents measured under steady state conditions by using the two-electrode voltage clamp. Oocytes were voltage clamped at indicated 
voltages and were superfused with 1 mM phosphate in the absence or presence of NaCl (see refs. 8 and 14). 
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Fig. 6. Characterization of type lib-mediated Na/Pj-cotransport. 
(A) Isotope flux measurements were performed at different Pj con- 
centrations in the presence of sodium and were corrected by the values 
obtained from oocytes injected with water. The calculated value for 
K m (?i) was 50 fxM. (5) Electrophysiological determination of the 
K m (Na). The calculated /C m (Na) was 33 mM, and the stoichiometry was 
>2. (C) pH dependence of type lib (black bars) and the renal type II 
(NaPj-7) Na/Pj-cotransport (open bars). Values were corrected with 
uptake in the presence of choline-Cl, which was not changed by the 
different pH values. The data represent the mean ± SD of 8-10 
oocytes. All experiments have been performed at least twice. The data 
given in C were derived from one oocyte. The same result was obtained 
with different oocytes from different batches. 

body raised against a synthetic C-terminal peptide. On West- 
ern blots performed with isolated small intestinal brush border 
membranes, a reaction with a single band of ^108 kDa was 
observed that could be protected completely by inclusion of 



the antigenic peptide (Fig. 44). Because four potential N- 
glycosylation sites are contained in a region representing an 
extracellular loop and because, in the case of the renal type II 
cotransporter, JV-glycosylation has been demonstrated in this 
loop (20), the molecular mass of 108 kDa likely represents the 
glycosylated form of the type lib protein (unglycosylated M r 
78). In cryostat sections of mouse duodenum, specific reaction 
was observed at the apical membrane of enterocytes that was 
prevented by the antigenic peptide (Fig. 4B). 

Type lib cRNA was injected into oocytes of X. laevis, and 
transport of phosphate was measured in the presence and 
absence of sodium by isotope flux (Fig. 5^4). Compared with 
oocytes injected with water, oocytes injected with type lib 
cRNA exhibited a large expression of Pi-transport, which 
depended on the presence of sodium and which was not 
observed after injection of antisense cRNA (data not shown). 
Furthermore, oocytes injected with type II cRNA did not take 
up sulfate (S0 4 2= ), suggesting that the type lib cotransporter 
exhibits similar specificity as described for the renal type Ha 
cotransporter (13, 18). As reported for the renal type II 
Na/Pj-cotransporter (8, 14), superfusion of oocytes expressing 
the type lib cotransporter with phosphate exhibited an in- 
wardly directed current that depended on the presence of 
sodium and the steady state holding potential (Fig. 5B). This 
indicated that, like the type II cotransporter, Na/Pj- 
cotransport by the type lib cotransporter is electrogenic. As 
observed in isotope flux measurements (Fig. 5A\ there was 
also evidence for a small contribution of a Na-independent 
Pi-transport. Based on the results shown in Fig. 5, transport 
characteristics of type lib-induced Na/Pj-cotransport were 
determined by either isotope flux or by electrophysiological 
measurements. By both methods an apparent tf m (Pi) of ~50 
piM and a /C m (Na) of ~30 mM was determined (Fig. 6). 
Because pH dependence is a hallmark of the renal type II 
Na/Pj-cotransporter, type lib-mediated Na/Pj-cotransport 
was determined at different pH-values (Fig. 6C). In contrast to 
the renal type II isoform, type lib-associated Na/Pj- 
cotransport was less dependent of the pH and was slightly 
higher at more acid pH-values. 

DISCUSSION 

Intake and extrusion of inorganic phosphate is determined by 
the intestinal and renal handling of phosphate. In recent years, 
some renal phosphate transporters and their roles in the 
regulation of the renal handling of Pj have been described (1, 
2, 4, 7, 8, 18). However, small intestinal phosphate cotrans- 
porters expressed in the mammalian small intestine are char- 
acterized far less. 

With respect to the renal handling of Pj, three dissimilar 
types of sodium-dependent phosphate cotransporters ex- 
pressed in the plasma membrane have been described so far: 
a type I, a type II, and a type III (1). It could be shown that 
the type II Na/Pj-cotransporter plays a major role in proximal 
tubular reabsorption of Pj and that proximal tubular capacity 
to reabsorb Pj to a large part depends on the net abundance of 
type II cotransporters in the apical membrane of proximal 
tubules (1, 2). The role of type I and III cotransporters is less 
clear. 

So far, expression of type II mRNA has been described in 
kidney cortex only. Expression of type 1 mRNA also is found 
in liver and brain, and expression of type 111 mRNA seems to 
occur in almost every tissue (1, 4), notably, also, in the 
intestine. The nonspecific tissue expression pattern of type III 
cotransporters rules out the possibility of type III Na/Pj- 
cotransporters being candidates for small intestinal apical 
Na/Pj-cotransporters. 

Derived from an EST clone from an embryonic mouse 
cDNA library, we obtained the full length cDNA coding for a 
type II Na/Pj-cotransporter. The deduced protein showed high 
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homology to described type II Na/Prcotransporters. Highest 
homologies were found in regions that most likely represent 
transmembrane segments, and the most striking differences 
between the identified cotransporter, the bovine, flounder, 
and X. Laevis isoforms, and the renal type Na/Pj-cotransport- 
ers are represented by cysteine clusters at the C-terminal ends. 
Therefore, we propose the name type lib Na/Pj-cotransporter 
for the identified mammalian isoform and propose to extend 
this nomenclature also for the bovine (15), the flounder, (16) 
and the X. laevis (17) isoforms. 

Injection of type lib cRNA into oocytes of A", laevis resulted 
in expression of Na-dependent phosphate transport with char- 
acteristics similar to that observed for Na/Pj-cotransport 
mediated by the renal type II Na/Pj-cotransporter (8, 14, 18). 
However, the most striking difference of type lib-mediated 
Na/Pi-cotransport was its pH dependence. 

So far, proteins involved in mammalian small intestinal 
Na/Pi-cotransport have not been described. In nonruminants, 
highest rates of Pj-reabsorption are observed in the upper small 
intestine (5). Na/Pi-cotransport in mouse small intestine is 
highest at a more acidic pH and exhibits a K m value for ?\ of 
**50 (5, 6).The functional characteristics observed for type 
lib-mediated Na/Pj-cotransport are in agreement with these 
data and support the notion that the type lib cotransporter 
may represent a candidate for a small intestinal Na/Pj- 
cotransporter. This is supported further by the observation 
that both type lib mRNA and protein are expressed in mouse 
small intestinal mucosa and, notably, that the type lib protein 
is localized at the brush border membrane of the enterocytes. 

In summary, we have identified a mammalian Na/Pj- 
cotransporter with high homology to described type II Na/P r 
cotransporters. Expression of the mRNA and the protein of 
such a transport protein was demonstrated in the mammalian 
small intestine. Kinetic properties and pH dependence of type 
lib-associated Na/Pj-cotransport favor this protein as a can- 
didate for a Na/Pj-cotransporter involved in intestinal Pj- 
reabsorption. Apart from the small intestine, expression of 
type lib mRNA also was recognized in other tissues, such as 
lung, colon, liver, kidney, and testis. The physiological role of 
Na/Pj-cotransport mediated by the type lib isoform in small 
intestine as well as in the other tissues remains to be deter- 
mined. 
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Phosphate plays a crucial role in cellular metabo- 
lism, and its homeostatic regulation in intestinal and 
renal epithelia is critical. Apically expressed sodium- 
phosphate (Na + -P,) transporters play a critical role in 
this regulation. We have isolated a cDNA (HGMW-ap- 
proved symbol SLC34A2) encoding a novel human 
small intestinal Na + -Pi transporter. The cDNA is 
shown to be 4135 bp in length with an open reading 
frame that predicts a 689-amino-acid polypeptide. The 
putative protein has 76% homology to mouse intestinal 
type II Na + -P» transporter (Na/Pi-IIb) and lower ho- 
mologies with renal type II Na + -Pi transporters. 
Northern blots showed a singular transcript of 5.0 kb 
in human lung, small intestine, and kidney. Computer 
analysis suggests a protein with 11 transmembrane 
domains and several potential posttranslational mod- 
ification sites. Functional characterization in Xenopus 
laevis oocytes showed that this cDNA encodes a func- 
tional Na + -P, transporter. Furthermore, the gene en- 
coding this cDNA was mapped to human chromosome 

4pl5.1-pl5.3 by the FISH method. © 1999 Academic Press 



Phosphate (PJ plays a major role in growth, devel- 
opment, bone formation, and cellular metabolism. The 
kidney and small intestine are important regulatory 
sites that maintain extracellular Pi concentrations. So- 
dium-coupled phosphate transport is the major form of 
P, absorption in both kidney and intestine. Phosphate 
uptake by renal and intestinal brush-border mem- 
brane vesicles has been studied previously in human 
(2), rat (6), rabbit (3), and mouse (15). The molecular 
basis of Pj uptake in kidney has been identified (sodi- 
um-phosphate [Na + -Pi] transporters types I and II) 

Sequence data from this article have been deposited with the 
EMBL/GenBank Data Libraries under Accession No. AF 146796. 
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and well characterized. Type II Na + -P 5 transporter is 
the major transport pathway of Pj reabsorption in kid- 
ney (5, 12). However, little is known about P, absorp- 
tion in the intestine. To date, only one mammalian, 
intestinal Na + -P s transporter has been identified from 
mouse (10). 

To further define the role of the small intestinal 
Na + -Pi transporter (NPT) in body Pi homeostasis, we 
have isolated a novel cDNA 2 from a human small in- 
testinal cDNA library. Initially, we designed PCR 
primers (homologous to mouse intestinal Na/Pi-IIb 
cDNA; forward primer at 686-705 bp and reverse 
primer at 1423-1442 bp) for RT-PCR experiments. 
PCR products of the predicted size (760 bp) were ob- 
tained from human small intestinal cDNA (mRNA 

2 The HGM W-approved symbol for the gene described in this paper 
is SLC34A2. 

MAPWPELGDAQPNPDKYLEGAAGQQPTAPDKSKETNKNNTEAPVTKIELL 50 
PSYSTATLIDEPTEVDDPWNLPTLQDSGIKWSERDTKGKILCFFQGIGRL 100 

TMl TM2 
ILL LGFLYFFVCSLDILSSAFQLVG GKMAGQFFSNSSI MSNPLLGLVIGV 150 

TM3 

LVTVLVQSSS TSTSIWSMVSSSLLTVRA AIPIIMGANIGTSITNTIVAL 200 

TM4 

MQVGDRSEFRRAFAGATVHD FFNWLSLLVLLPVEVATHYLEII TQLIVES 250 
FHFKNGEDAPDLLKVITKPFTKLIVQLDKKVISQIAMNDEKAKNKSLVKI 300 
WCKT FTNKTQ INVT V PS TANCT SPSLCWTDGI QNWTMKNVT YKENIAKCQ 350 
TM5 

HIFVNFHLPD LAVGTILLILSLLVLCGCLIMIV KILGSVLKGQVATVIKK 400 

TM6 TM7 
TINTD FPFPFAWLTGYLAILVGAGMTFI VQS SSVFTSALTPLIGIGVITI 4 50 

TM8 TM9 
SRAYPL TLGSNIGTTTTAILAALAS PGNALRSS LQIALCHFFFNISGILL 500 

TM10 

VOTIPFTRLPIRMAKGLGNISAKYRW FAVFYL1IFFFLIPLTVFGLSLA G 550 
TM11 

WR VLVGVGVPWFIIILVLCLRLLQ SRCPRVLPKKLQNWNFLPLWMRSLK 600 
PWDAWSKFTGCFQMRCCCCCRVCCRACCLLCGCPKCCRCSKCCEDLEEA 650 
QEGQDVPVKAPET FDNI T I SREAQGEVPASDSKTECTAL 689 

FIG. 1. Predicted amino acid sequence of human small intestinal 
NPT. Underlined amino acid sequences represent potential trans- 
membrane (TM) domain regions, which are numbered sequentially. 
Boldface amino acids represent putative N-glycosylation sites. Nu- 
cleotide sequence can be found in GenBank under Accession No. 
AF146796. 
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FIG. 2, mRNA expression of human intestinal NPT, mRNA from 76 different human tissues was loaded on the nylon membrane [mRNA 
levels were normalized versus eight different gene markers (Clontech manual), so this blot is quantitative] (A). The blot was hybridized with 
cDNA-specific probes under high-stringency conditions and exposed to film. Also shown is a Northern blot of human small intestinal NPT 
(B). The blot was hybridized with cDNA-specific probes under high-stringency conditions and exposed to film. A 5-kb transcript was detected 
in several tissues. The blot is not quantitative, as mRNA loading was not normalized. 



from Clontech, Palo Alto, CA). Sequencing of the am- 
plified fragment revealed 82% nucleotide sequence 
identity with the mouse intestinal Na/Pi-IIb. This sug- 
gested that this cDNA likely represents a type II 
Na + -Pj transporter in human small intestine. 

Using the PCR product to generate radioactive 
probes, we isolated a cDNA clone from an enriched, 
human small intestinal cDNA library (Edge Biosys- 
tems, Gaithersburg, MD), utilizing high-stringency 
screening conditions. This cDNA was sequenced on 
both strands. Sequence data indicated that this human 
intestinal cDNA has 4135 bp and encodes a putative 
protein of 689 amino acids (open reading frame 36 to 
2102 bp) (Fig. 1). Hydropathy analysis (Omiga 1.1.3 
software, Oxford Molecular Ltd., Oxford, England) pre- 



dicts 11 transmembrane domains. The putative protein 
has many potential posttranslational modification 
sites. We compared our cDNA with other identified 
Na + -P t transporter cDNAs, which showed over 60% 
nucleotide sequence identity and over 75% amino acid 
sequence identity with bovine renal type II Na + -P, 
transporter (7), mouse intestinal type II Na + -P, trans- 
porter (10), and an unpublished human NPT (Gen- 
Bank Accession No. AF1 1 1856). Recently, another type 
I Na + -P, transporter cDNA was isolated from a human 
intestinal cDNA library (16); however, the sequence 
similarity with our cDNA is very low (<20%). Overall, 
these findings suggest that our newly identified human 
cDNA clone encodes a protein that belongs to the in- 
testinal type II Na + -Pj transporter gene family. 
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mRNA expression of this cDNA clone was analyzed 
with a human 76-tissue mRNA blot (Human Multiple 
Tissue Expression Array, Clontech). Hybridization uti- 
lizing the 760-bp PCR fragment as a template to gen- 
erate radiolabeled probes showed that this gene is ex- 
pressed in many tissues (Fig. 2A). Highest expression 
was seen in lung, small intestine, and kidney. Further- 
more, Northern blot analysis [human 11-tissue mRNA 
blot (Clontech)] showed a single transcript at approxi- 
mately 5.0 kb in human lung, small intestine, kidney, 
liver, and placenta (Fig. 2B). Interestingly, previous 
studies showed that human renal NPT mRNA tran- 
scripts were detected at 2.0 kb [type I Na + -P, trans- 
porter (14)] and at 2.7 kb [type II Na + -P, transporter 
(12)]. This suggests that this 5-kb transcript detected 
in human kidney likely represents an unidentified type 
II Na + -Pj transporter isoform. Also, since this gene is 
highly expressed in adult and fetal lung, it seems prob- 
able that this newly identified human Na + -Pi trans- 
porter has an important physiological function in lung 
(possibly involved in the production of surfactant by 
the alveoli). 

To characterize the function of the protein encoded 
by this cDNA, we produced cRNA, injected it into Xe- 
nopus laevis oocytes, and measured radiolabeled Pi 
influx in the presence or in the absence of Na + (1, 5). 
Compared with uninjected oocytes, the Na + -Pi trans- 
porter cRNA injected oocytes exhibited an approximate 
55-fold increase of P, transport {P < 0.0001) (Fig 3), 
which suggested that this cDNA does indeed encode a 
functional Na + -Pi transporter. 

To identify the locus of our Na + -Pi transporter gene, 
a PCR-ampIified 760-bp cDNA fragment was used to 
localize the gene position by performing the FISH map- 
ping technique in lymphocytes isolated from human 
blood (SeeDNA Biotech Inc., Windsor, Ontario, Cana- 
da). The 760-bp human intestinal NPT cDNA probe 




C-NaCI HNPT-NaCI C-Cholino HNPT-Chollrto 

FIG. 3. Characterization of human intestinal type II NPT in 
oocytes injected with cRNA. Isotope (P,) influx measurements were 
performed in the presence of NaCl or choline chloride. An approxi- 
mate 55-fold increase of P, uptake was seen in cRNA-injected 
oocytes. C, uninjected oocytes. HNPT, cRNA-injected oocytes. 
(Mean ± SEM of 5 oocytes per group; n = 4, *P < 0.0001 for HNPT 
versus all other groups). 




FIG. 4. Chromosomal localization of the human small intestinal 
NPT gene to chromosome 4pl5.1-pl5.3. (Left, arrow) FISH signals 
on human chromosome hybridized with the 760-bp probe. (Right) 
The same mitotic chromosome stained with DAPI to identify human 
chromosome 4. 



was biotinylated with dATP (8), and FISH detection 
was performed by the procedure described by Heng and 
Tsui (9). FISH signals and the D API-banding pattern 
were recorded separately, and the assignment of the 
FISH mapping data with chromosomal bands was 
achieved by superimposing FISH signals with DAPI- 
banded chromosomes. Results showed that hybridiza- 
tion signal was detected only on chromosome 4pl5.1- 
pi 5.3 (Fig. 4). Previous investigations showed that 
human renal Na + -P, transporters are found on chro- 
mosomes 5 and 6 [5q35 (11, 13), 6p21.3 (unpublished 
results; GenBank Accession Nos. U90544 and 
U90545), and 6p21.1-p23 (4, 11)]. Also, type I Na'-P, 
transporter cDNA from human intestinal mucosa was 
localized to 6p21.3-p23 (16). These data indicate that 
Na + -Pi transporter genes are widely distributed in the 
genome. 

In summary, we isolated a 4135-bp type II Na + -P, 
transporter cDNA from a human small intestinal 
cDNA library. This cDNA encodes a 689-amino-acid 
protein, which shares high sequence homology with 
intestinal and renal type II Na + -P } cDNAs from vari- 
ous mammalian species. This gene encodes a Na + -Pi 
transporter that is highly expressed in human lung, 
small intestine, and kidney and is localized to chromo- 
some 4pl5.1-pl5.3. 
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Note added in proof. The unpublished human Na~-P, transporter 
cDNA sequence mentioned in the text (Genbank Accession No. 
AF1 11856) has now been published (Field, J. A., Zhang, L., Brun, 
K. A., Brooks, D. P., and Edwards, R. M. (1999), Cloning and func- 
tional characterization of a sodium-dependent phosphate trans- 
porter expressed in human lung and small intestine. Biochem. Bio- 
phys. Res. Commun. 258: 578-582). 
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Predicting functions from protein tu. Ztfu_ 
sequences— where are the bottlenecks? 



Peer Bork 1 & Eugene V. Koonin 2 



The exponential growth of sequence data does not 
necessarily lead to an increase in knowledge about the 
functions of genes and their products. Prediction of 
function using comparative sequence analysis is 
extremely powerful but, if not performed 
appropriately, may also lead to the creation and 
propagation of assignment errors. While current 
homology detection methods can cope with the data 
flow, the identification, verification and annotation of 
functional features need to be drastically improved. 



With the rapid growth of sequence-related and other databases, 
there is increasing concern about the impact of this information 
explosion 1,2 . Is the burgeoning diversity of information an advan- 
tage or will it play havoc with genome analysis and ultimately lead 
to an error catastrophe? The eventual success of genome projects 
will depend on our ability to handle information in a manner that 
enhances the capability for function prediction rather than pol- 
lutes the analysis with noise. Only a minority of sequenced genes 
have been studied in direct experiments. In the foreseeable future, 
the gap between the number of sequences available and the extent 
of functional characterization of gene products is expected to 
broaden even further. It is apparent that computer procedures for 
the prediction of functional features from sequence are much 
faster and cheaper than Vet* experiments and, by default, are 
applied to each gene that is sequenced. This puts tremendous 
pressure on computational approaches to ascribe as much func- 
tional information as possible to each gene. It appears, however, 
that within the typical framework of current sequencing projects, 
optimization of the computer analysis and functional annotation 
has not yet been achieved. A testimony to this is the repeated dis- 
covery of new, functionally relevant features in sequences that 
already have been subjected to standard computer procedures. 

Given the database growth and the accompanying increase in 
noise and redundancy, we believe that there are currently two 
major bottlenecks that need to be overcome en route to efficient 
functional predictions from protein sequences. First, there is the 
lack of a widely accepted, robust and continuously updated suite 
of sequence analysis methods integrated into a coherent ancVeffi- 
cient prediction system. Second, there is considerable 'noise' in 
the presentation of experimental information, leadingto insuffi- 
cient or erroneous functional assignment in sequence databases. 

Here we review some computer-based approaches that allow 
utilization of more functional and structural information than 
the current standard schemes, and discuss some of the difficul- 
ties in handling and interpreting functional information. 

Effects of database growth 

From a purely statistical standpoint, the chances of detecting sig- 



nificant similarity between a new sequence and the ones already 
available in databases decrease with the expansion of the search 
space, in this case, database growth 3 ' 4 . Fortunately, at least three 
major factors counter this adverse statistical effect. First, the 
sequence space is not infinite: new sequences fill it and inevitably 
increase the chance of finding homologues. Second, complete 
genome sequences of phylogenetically distant species bring a 
qualitative improvement to the representation of conserved gene 
families 5 . With numerous genome sequences of unicellular 
organisms already available and the majority of human genes rep- 
resented in the Expressed Sequence Tags (EST) database 6 , it is 
becoming increasingly likely that a family to which a new protein 
belongs is already represented in the databases". Third, the devel- 
opment of new, more sensitive methods for information filtering 
and database searching, as well as improved strategies for their 
application, result in the delineation of previously undetectable, 
subtle relationships between sequences. 

The net effect is that, for a given sequence, the likelihood of 
detecting a homologue in the databases steadily increases with 
time. To illustrate this, we followed the kinetics of homology iden- 
tification and function prediction for an unbiased data set, namely 
the proteins encoded by the genes on yeast chromosome III. After 
the initial characterization of this eukaryotic chromosome in 1992 
(ref. 8), and early efforts to push the limits of computer-aided pre- 
dictions*" 11 , there has been a continuous linear increase in the 
fraction of proteins that have identifiable homologues and pre- 
dictable functions (Fig. I). Although due in part to a decrease in 
the number of open reading frames (ORFs) identified as likely 
genes, this trend demonstrates the increasing utility of computer 
analysis despite database growth. Of the current set of 25 predicted 
genes without homology or functional assignment, 15 are smaller 
than 150 amino acids and may not be expressed at all 12 . With 
homologues now detectable for 85% of the proteins encoded by 
genes on yeast chromosome III and at least some functional fea- 
tures identified for 70%, we may soon approach an upper limit for 
computer-aided predictions. The depth of the functional charac- 
terization attainable for many proteins, however, will continue to 
increase, which is not reflected in the above numbers. 

Reducing the noise in sequence searches 

In-depth analysis of protein sequences often results in functional 
predictions not attained in the original studies. This can be illus- 
trated by the results obtained with genes mutated in human dis- 
eases (disease genes). Identification of such genes typically 
involves the time- and labour-consuming process of positional 
cloning 13 . It is therefore critical that as much functionally rele- 
vant information as possible is extracted from the protein 
sequence encoded by a disease gene once it becomes available. It 
is not uncommon, however, that rapid computer re-analysis pro- 
duces unexpected insight into the evolutionary relationships and 
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Table 1« Selected examples of computer-aided discoveries with positionally cloned disease genes 



(protein) 

LEP 
(leptin) 



Hijnciype' 
Disease 

Hereditary 
obesity 



DMD Muscular 

(dystrophin) dystrophy 

HD Huntington 

(huntingtin) disease 



on protein sequence 
None 23 



Spectrin repeats 40 
None 44 



rV-. 1 fin-Co-;* by r 'U f • 

analysis and implications 

Structural similarity to helical cytokines 
identified by threading; likely cytokine 
activity and features 24 



WW, and 21 signalling domains 41 - 42 



HEAT repeats covering a considerable 
fraction of the protein and indicating 
structural features 45 



experimental support 

Leptin structure is typical 
of helical cytokines 25 ; 
leptin receptor is 
homologous and 
functionally analogous 
to cytokine receptors 26 

3D structure of WW 
and ligand 43 

Conservation of repeats 
in homologues 



BRCA1 Hereditary N-terminal RING- 

(BRCA1) breast cancer finger domain 46 



BRCA2 
(BRCA2) 



CHM 
(CHM) 



FRDA1 
(frataxin) 



CLCN1 
(CLCN1) 



TAZ 

(tafazzin) 

MLH1 
(MLH1) 



WRN 
(WRN) 



BLM 
(8LM) 

WAS 
(WAS) 

SCA2 
(ataxin-2) 



Hereditary 
breast cancer 



Choroideremia 

(hereditary 

blindness) 

Freidrich 
ataxia 



Myotonia 
(Thomsen 
disease) 

Barth 
syndrome 



Coiled-coil domains 52 



Homologue of guanine 
nucleotide dissociation 
inhibitor Rab-GDI 55 

Highly conserved 
eukaryotic homologues 5 * 



Chloride channel 61 



None 63 



Hereditary Homologue of bacterial 
non-polyposis and yeast DNA 
colon cancer repair protein MutL 65 



Werner 
syndrome 
(premature 
aging) 



Bloom 
syndrome 



DNA helicase domain 
(RecQ homologue) 67 



DNA helicase domain 
(RecQ homologue) 70 



Wiskott- WH1 (ref. 71) 

Aldrich syndrome 

Spinocerebral Polyglutamine stretch 
ataxia-2 expanded in ataxia 73 



C-terminal BRCT domain conserved in many 
DNA repair-dependent checkpoint proteins; 
likely role in cell cycle checkpoints 47 " 49 



Previously unknown repeats covering 
almost a third of this large protein 53 



FAD (NAD)-binding domain 56 



Bacterial homologues (CyaY); on the 
basis of phylogenetic analysis, a mito- 
chondrial function suggested for frataxin 59 

CBS domain 62 



Acyltransferase domain; possible 
. role in membrane biogenesis 64 

ATPase domain conserved in topoisomerase 
type H (3D structure available), HS90, and 
His kinases; predicted ATPase activity 14 * 66 

N-terminal exonuclease domain (structure 
known for homologous domain in 
bacterial PolA) and putative C-terminal RNA- 
binding domain conserved in BLM and RNA- 
ase D; predicted exonuclease activity 14 68 - 69 

C-terminal nucleic acid-binding domain 
conserved in WRN and RNAase D 69 

WH1 in homer with ligand-binding 
implications 72 

Novel conserved domain shared with splice- 
osomal Sm proteins and bacterial global 
transcription regulators; possible role 
in spiking 74 



BRCT domain coincides 
with transcription 
activation domain 50 , 
BRCA1 involved in 
repair and cell cycle 51 

Verification of the repeats 
by sequencing 
homologues 54 

3D structure confirmed 
presence of a dtnucleotide- 
binding domain 57 

Mitochondrial localization 
of frataxin demonstrated 60 



NA 



NA 



NA 



NA 



NA 



NA 



NA 



NA, no data available 



314 



nature genetics volume 18 april 1998 



I 



Ukrfy functions of disuse V^dt^*?"*™™^ 
examples, ret. 14 for a mo de Ui W P unctiona , char- 
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^£ i 52S:!p. have been missed in the original 
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Generally, the problem of of sign al-to-noise 

database search can be formu ated m te ms o g^ ^ ^ 
ra tio-3. One way to ^P^^^loprf PSl-BLAST 
tivity of the search me ^. The re cenUy . $ ^ 

method (Posmon-Spec .fie Iterative » yersion of 

maj or step in this directum It combines th ^ 
the popular BLAST profile analysis. PSI- 

incorporate 8 a PPf h .tf"^ e a „d, when combined with 
BLAST is fast and highly se """ f ' ions (se e below), also 
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highly selective. Nevertheless there ^ . ^ using 

mLt as the signal-to-no,se ratio can be in ^ on 
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S £SS ^ ' S-er extent than the no.se, 
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regions using the SEG program proximately 
as a default parameter in the BLAST £og nteQ b y 

15% of protein-sequence ^databa es appear ^ tQ 
mese low complex regmn - wh,^ 

non-globular domains;, »* to produ ce database hits 
terns'*. Low-complexity sequences te F rf a random 

with artifactuaUy low P-values es « e ^ uence$ with a similarly 
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reduced residue alphabet ^^^^^rate methods have 

been developed that also sn j other i 0 w-com- 

step. In addition, programs exist tor a se&men ^ > although one 
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domains-". Indeed compuuttona^ «- -J-^ suggest 
numerous undetected domains J ^J 5 ^ ,). 
ad ditional functional and *SSqSiee similarity, 'threading 
In the absence of recognizable sequenc for 
a pproaches-that is fold «^J^Si*-- *~ 
sSJuence compatibility wrt J»*Stanl insights, as has beer, 
Ses-might reveal -"Jg^S obesity gene product", 
successfully demonstrated for leptin J has bee „ 

The prediction of » h ^^^Tdetermination» and by 
confirmed both V "?J 6 However, the accuracy of thread- 
numerous functional stud es ho . { me mos t pow- - 
tag methods - limited^, and u^sens.ti 
erfu l modern m ^ in f Fo Sample, highly accurate fold 
approaches that of threading .tor M ^ obactef i urn gemtalium 
assignments for more than 35 h ot ai w dy poS sible using 
"ofeins (or some of their dornamOar o«» T P 
Iterative homology searches ^^"^y has been claimed 

lotS^ 

Effects of noise on - critical point 
Functional information is hard to quant ry afe 

in an analysis is determining ; W ^ ce(s) similar 

shared between the P^b^SX^re orthologues^'. 
to it in the database. In ttu : best ase perform the same mo- 
corresponding genes « often difficult to discern 
lecular function. Even this, howeve para logues (other 
in a situation where cn ^e diLct. but perhaps 
members of a mult.gene family ^ sequ ences 
related, functions). With many c y d to bec0 me 
ava Uable. databases of orth ologue are JJ^ „ ^ 
indispensable for mnct.onal MjoUdoo 

homologue to the query . has onlv rece ntly 

315 



3CC 



1 



nature 



genetics volume 18 april 1998 



TT ^ »mmii ujUVJKlTpfei'grefflft 1 197/httl. 1250. ?irmtA W P«mojI 



vol ffl pnos m joftject to cfume. ••Slodeals m«t provide i cow «f i wW itudert ». 



Table 2 • A checklist for in-depth analysis of protein sequences and prediction of function from sequence* 



Procedure 



Purpose and comment 



identification of and filtering for structural features 



Mask non-globular or highly compositionaliy-biased 
regions (reduced residue alphabet) 

Mask coiled-coil regions 



Identify transmembrane regions (including signal 
sequences and GPI anchors) 



Identify internal repeats 



Predict secondary structure 



Reduce noise and avoid spurious hits due to low complexity regions 



This is a special type of low-complexity region that causes numerous other 
coiled-coil regions to match but is not efficiently detected by general 
methods for complexity analysis 

Yet another form of composition bias that may result in matches with 
non-homologous membrane regions; the presence of these domains should 
be taken into account in all functional predictions. 

Reduces the search space for remaining parts and also may lead to the 
detection of novel repeat types 

The best programs predict a protein's structural class; use of multiple 
alignments significantly improves the accuracy of prediction 



Identification of homologues 



Identify known domains in dedicated databases 

(for example, Pfam, PROSITE, BLOCKS, PRINTS, SMART) 

prior to a BLAST search 

Search complete sequence databases with subsequences 
of long (>200 a.a.) proteins individually; preferably use 
subsequences separated by known domains or 
low-complexity regions 

Perform reciprocal searches to verify weak similarity 
to possible homologues 



Perform exhaustive, iterative database searches 



Combine search for pairwise sequence similarity 
(for example, first BLAST scan) with profile, motif, and 
pattern searches (explicit example in Psiblast 16 ) or by 
using the various programs available 15 



Identification of annotated domains may be more sensitive when these 
databases are used; removal of known domains also reduces the search 
space for remaining parts of the protein. 

Increase search sensitivity by reducing the search space; exclude domains 
with numerous homologues (for example, protein kinases), which may 
obscure even highly significant similarity to other domains of the query 



The alignments of a potentially relevant database hit with its indisputable 
homologues support (if the conservation pattern is consistent) or reject (if it 
is different) a weak pairwise similarity. 

Database search methods are non-transitive and non-symmetric; therefore 
analysis of a protein family should be performed iteratively, starting with 
different members, until no new homologues are detected 

The information contained in a multiple alignment provides for amplification 
of weak but potentially important sequence signals and is indispensable for 
the delineation of protein superfamilies. 



Prediction of protein functions 



Carefully consider domain organization and distinct 
functions of individual domains 



Do not take database annotation for granted, especially 
if only one homologue is detectable or there is 
inconsistency between different homologues 

Do not simply transfer functional information from 
the best hit 



Do cluster analysis of the homologues to identify the 
appropriate level of precision for functional prediction 



Check sequence context (e.g. likely clashes in the 
co-occurrence of a signal sequence and a zinc-finger or 
glycolysation sites in a cytoplasmic protein) 



Many proteins are multifunctional; assignment of a single function, which is 
still common in genome projects, results in loss of information 
and outright errors 

Databases contain a number of incorrect annotations due to experimental 
errors as well as functional assignments on the basis of dubious 
sequence similarity 
v 

The best hit is frequently hypothetical or poorly annotated; other hits with 
similar or even lower scores may be more informative; even the best hit 
may have a different function (see below) 

It is typical that the general function of a protein can be identified easily but 
the prediction of substrate specificity is unwarranted; for example, many 
permeases of different specificity show approximately the same level of 
similarity to each other 

Comparison of different predicted structural and functional features helps 
avoiding erroneous predictions 



Identify similarities to proteins with known 3D structure Models of highly conserved homologues can be built and might reveal 

further functional insights 

•Complementary checklists can be found in ref . 1 5 
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Flg.2 Sequence alignment of a 
selected set of SAM domains with 
p73-like proteins. Conserved hydro- 
phobic positions are shown in bold; 
residues that are conserved in at least 
75S . of the sequences are high- 
lighted. The SAM consensus 13 corre- 
sponds to the consensus of the 
alignment. Secondary structure pre- 
dictions ('a' represents 'alpha helix') 
have been taken from ref. 33. Position 
in the sequence and database acces- 
sion numbers are given in the third 
and last column, respectively. 



St«ll yeasc 1? 

DGKd human 1190 

S1CP human 60 
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tional annotations have already been incorporated and subse- 
quently propagated in sequence databases, A quality index for 
functional annotation in databases still remains a distant goal 
and new approaches are required to improve the sensitivity of 
functional characterization so as to avoid functional over- and 
under-predictions for a given database match. 

Transfer of functional information from sequences in the 
database to the query is also hampered by the effects of noise in 
the functional description of proteins. Updates on functional 
features require an awareness of the scientific literature; an 
experiment in one species on a previously sequenced gene 
brings important consequences for homologous genes in other 
species. This has led to a gap between functional information 
contained in the sequence databases and the specialized knowl- 
edge embodied in the literature. At present, there is no auto- 
matic method that can replace literature searches. A recent case 
in point is p73, a human paralogue of the tumour suppressor 
p53. It contains a carboxy-terminal extension, for which simi- 
larity has only been found in squid ^53* (ref. 32). However, rel- 
evant and potentially important information about this region 
could be obtained by simply searching the PubMed database for 
the combination of terms ^53' and 'Loligo' (Latin for squid). It 
has been shown that the squid p53 homologue contains a C-ter- 
rainal SAM domain, a distinct protein- protein interaction and 
dimerization domain found primarily in developmental regula- 
tors 33 . Furthermore, just weeks after the publication of human 
p73, the gene encoding rat KET protein, a close p73 homologue 
was sequenced 34 . With the KET sequence in the database, a PSI- 
BLAST search using the conservation profile of the two p73 
species readily reveals significant similarity between their C-ter- 
minal regions and numerous SAM domains (Fig. 2). The SAM 
domain in p73 may be involved in dimerization and may medi- 
ate an interaction with another protein(s) involved in transcrip- 
tion regulation and developmental control of gene expression. A 
more thorough literature search or the use of 'awareness' tools 
(see below) would have helped to retrieve, in ah automatic man- 
ner, potentially important information on p73. 

V 

Toward integration of functional and structural features 

In summary, the currently available methods ior sequence 
analysis are sophisticated, and while further improvements will 
certainly ensue, they are already capable of extracting subtle 
but functionally relevant signals from protein sequences. 
Whether or not a researcher actually reaps maximal benefit 
from such analysis, however, depends on the application of an 
appropriate combination of methods in the correct setting. 



There is no single, universal recipe for this purpose, but we 
have attempted to compile a short checklist, which will mini- 
mize the risk of missing important functional signals hidden in 
protein sequences (Table 2). 

A comprehensive, precisely defined and standardized classifi- 
cation of biological functions is required for the automation of 
the prediction of gene functions. The task of constructing such a 
classification is immensely difficult given that even small pro- 
teins — not to mention large, multidomain ones — are certain to 
have multiple roles in the cell. Classification schemes that have 
been proposed for prokaryotic gene products 35 ' 37 are useful in 
comparative genome analysis but grossly oversimplify the prob- 
lem. We believe that, at present, there is still no proper language 
for the adequate and uniform description of functions and 
therefore it is hard to predict when, if at all, a (a)periodic system 
of biological functions may become a reality. 

Generally, the incorporation of known functional information 
into databases at various levels is a pressing need, requiring the 
combined efforts of experimentalists, computational biologists 
and database developers. The challenge is particularly formidable 
as new types of information, such as tissue- and organ-specific 
gene expression patterns on a genome scale as well as numerous 
data on protein interactions and post-translational modifications 
are rapidly becoming available (for details see refs 38,39). 

This continuous flow of information also requires 'update' 
and 'awareness* tools that filter and incorporate incoming data 
(sequences, literature, etcetera) and new applications (servers, 
methods). Ideally, these tools will notify researchers upon sub- 
scription using a customized query profile (keywords, sequences 
of interest, problem description) or systematically integrate the 
filtered information into dynamic databases. The major problem 
that remains is the quality of the information, as no data- mining 
tools yet exist that can judge the various experimental protocols 
described in the respective literature. 

There is little doubt that a new generation of bioinformatics 
approaches will soon integrate sequence and structure analysis 
methods with awareness tools, information filters and dynamic 
data processing in preparation for the forthcoming postge- 
nomics aera. However, all these tools will only facilitate but not 
replace the work of a scientist who defines the questions and 
interprets the results. 

Note added in proof: Elizabeth Greene & Steven Henikoff have 
constructed an article for the Nature Genetics website (http://www. 
genetics.nature.com/gazing/), which provides direct links to a 
range of web-based tools for database sequence searches, and 
homology and structural predictions for query protein sequences. 
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What We Do Not Know About Sequence Analysis and 
Sequence Databases 

The marriage of high-throughput nucleotide sequencing with 
computational methods for the analysis of nucleotide and 
protein sequences have ushered in a new era of molecular 
biology. Entire genomes are deposited into the sequence DBs 
at a growing rate. Typically, investigators can use computa- 
tional sequence analysis to assign functions to the majority of 
the open reading frames in genome sequences. That analysis 
can identify a surprisingly large fraction of the genes within 
the organism. That fraction is increasing over time as the 
sequence databases contain a larger fraction of all functional 
domains. 

The growing wealth of information within the sequence 
databases provides a foundation for the biology of the 21st 
Century. We will mine these data for decades to come, 
developing complex and incredibly accurate cellular models 
that can predict the behavior of living systems by integrating 
across the functions of their molecular parts. 

Or will we? Although the preceding scenario is the likely 
one, we would be irresponsible to not consider another 
possible outcome: an explosion of incorrect annotations 
within the sequence databases. Each new sequence deposited 
in the public databases has been annotated with respect to 
those same databases. Functional annotations are propagated 
repeatedly from one sequence to the next, to the next, with no 
record made of the source of a given annotation, leading to a 
potential transitive catastrophe of erroneous annotations. 
Investigators who later attempt to separate the wheat from the 
chaff will discover that they cannot simply retreat to the safety 
of experimentally annotated sequences by ignoring the 
computationally annotated sequences, because the public DBs 
do not explicitly distinguish the two sets. In fact, the public 
sequence DBs keep virtually no tracking information about 
the methods used to annotate their data. 

Can we rule this possibility out on any objective grounds? 
No. We have no reliable data regarding either the current rate 
of errors (incorrect functional annotations) within the public 
DBs, nor on the rate of change.of that error rate (we do not 
even know if it is increasing or decreasing each year). 

Many years of research have led to the development of 
detailed statistical models for sequence-similarity searching 
algorithms such as FASTA and the BLAST family of 
programs. Researchers employ these algorithms to identify 
the functions of novel sequences in two phases. In phase I, 
they identify homologs of a novel sequence. In phase II, they 
infer the function of the novel sequence with respect to the 
homologs that have been identified by examining the range of 
functions of the homologs and the sequence regions those 
homologs share with the novel sequence. A number of studies 
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have examined the sensitivity and specificity with which those 
algorithms identify homologs of a query sequence in a 
sequence database (phase I). However, few if any studies have 
evaluated the accuracy of phase II, or of the complete 
functional-prediction process. Incorrect functional predictions 
can result from a number of causes, including: divergence of 
function within homologous proteins; confusion or omission 
of functions across multinodular proteins (e.g. if only one 
function of a multimodular protein is given in its description 
line, but a region of the protein containing a different function 
is what matched the query sequence); and omission of phase 
II altogether by simply choosing the strongest homolog as the 
source of attributed function. 

Although we know the accuracy with which sequence 
homologs can be determined, we know little about the . 
accuracy of the overall process of assigning function by 
homology. Thus, remarkably, just as we have no idea what is> 
the rate of errors in the database foundation of biology for the 
21st century, we also have little hard information about the 
accuracy of the most commonly used computational method 
in bio informatics — the method that underpins all genome 
sequence analysis. 

Consider the following additional questions. It is likely that 
different people carry out phase IT with somewhat different 
methods that produce different accuracies: what is the range 
of accuracies? In the past few years, however, programs such 
as GeneQuiz and Magpie have begun to tackle phase II. What 
are their accuracies? Are the programs more or less accurate 
than expert scientists? To address these questions it would 
help greatly if a sequence-analysis benchmarking test-suite 
were available, i.e., a set of sequences whose functions had 
been established experimentally and were known to be 
correct, and that could be used as a test set for evaluating 
programs or scientists. (Such test suites have been of great 
value in the protein-structure prediction field.) To critically 
evaluate these programs and to learn from their successes and 
failures, it will prove crucial for their authors to publish the 
decision rules that the programs employ. However, these rules 
have typically not been published in past articles. Although a 
number of new methods have recently been proposed for 
automated sequence analysis (such as the COGS method), it 
is hard to see why we should have any confidence in the 
claims made about the accuracy of these methods when those 
claims are based on a handful of examples rather than on 
systematic empirical studies. 

To assemble a set of sequences for a sequence-analysis 
benchmark, we would attempt to find a set of sequences 
whose functions are thought to be correct with high 
confidence. Unfortunately, the current sequence DBs provide 
little assistance in finding such a set of sequences. We have - '] 
more faith in the correctness of those sequences whose J 
functions were determined experimentally, rather than / 
through computational means. But the sequence DBs do not — 
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distinguish experimentally determined functions from those 
determined computationally, much less do they associate a 
level of confidence with each functional assignment. (Note 
that Swiss- Prot does distinguish some sequence features as 
4 from sequence analysis', but not the overall function assigned 
to the sequence.) 

If we had reason to question the accuracy of a particular 
functional assignment in a sequence DB, what would we want 
to know about the sequence? We would want to know if the 
function was determined experimentally or computationally; 
if computationally, we would want to know whether a person 
or a program made the final decision. If a program, we would 
want to know which program; if a person, we would want to 
know which similarity-searching program (or set of 
programs) the person relied on. We would also. like to know 
when the function was assigned (was the assignment based on 
up-to-date computational techniques and up-to-date 
databases?). We would also like to know the overall rate of 
errors in the functional assignments in the sequence DB we 
are looking at. 

It is extremely difficult both to estimate the error rates of any 
sequence DB as a whole, and to estimate the reliability of any 
particular entry within these DBs, for three principal reasons. 

First, the DBs themselves provide very little metadata — 
historical or tracking data ABOUT the primary data within the 
DB (such as a level of confidence in a functional annotation, 
or the name of the program that created the functional 
annotation). Without such metadata it will be difficult for 
scientists — and virtually impossible for programs — to make 
intelligent decisions about what data to trust. 

Second, the sequence DBs typically do not publish detailed 
descriptions of their methodologies to tell us exactly what 
manual and automated procedures they subject each sequence 
to. For example, we do not know what checking or correction 
procedures new GenBank entries are subjected to. We also do 
not know precisely what sequence-analysis procedures are 
used to annotate Swiss-Prot or TREMBL. These DBs do not 
simply accept sequences annotated by other scientists — they 
perform sizeable annotation operations of their own which 
should be documented in detail. We also do not know the rate 
at which wrong annotations in GenBank or Swiss-Prot are 
corrected. The rates are probably very different because 
Swiss-Prot accepts corrections from its user community, 
whereas GenBank apparently only accepts corrections from 



the author of an entry. We also do not know the error rate in 
the corrections! The procedures used by each DB essentially 
tell us 'default metadata' for each DB entry. In fairness, one 
reason such descriptions are not published is that journals 
have apparently not accepted descriptions of these procedures 
that have been submitted by the database authors. Such papers 
must be published. 

Third, published descriptions of full genome sequences 
typically do not explain in detail the sequence-analysis 
procedure used for the genome, e.g. exactly what set of 
analysis programs were applied to each sequence, and how 
were their outputs combined? Such accounts would again 
provide default metadata for interpreting the annotation of 
each genome. 

Many of these ideas have been circulating in the 
bioinformalics community for years. The GSDB project at the 
National Center for Genome Resources and the GAIA project 
at the University of Pennsylvania have addressed some of 
these issues by creating database schemas that represent 
extensive metadata about sequence annotations, but their 
ideas have not been adopted by the other public sequence 
DBs. By allowing these problems to fester we risk a state of 
affairs where the scientific community loses all faith in the 
annotations within the public databases, and where the 
databases have become so large that they cannot be revised 
within an acceptable time frame. 

The following recommendations address these problems: 

• The public sequence DBs should develop 
next-generation schemas that encode metadata about 
sequence annotations. Significant value can be obtained 
quickly from relatively simple schema extensions. 

• The public sequence DBs should thoroughly document 
their operating procedures, such as annotation strategies 
and update policies. 

• Research is required to estimate the current error rates of 
the sequence DBs. 

• Research is required to estimate the error rate of 
functional annotation by different methods of 
computational sequence analysis. 

• A sequence-analysis benchmarking suite should be 
developed to allow systematic evaluation of automated 
programs that perfomi functional annotation. 

Peter D. Karp 
pkarp@PangeaSystems. Com 
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The nine reviews in this section chart new methods for 
understanding the biological messages of genome 
sequences. 'The accelerating rate at which these 
sequences are being determined has created a demand for 
informative analytical methods. The accumulation of new 
data does not in itself lead to increased knowledge. Rather, 
it challenges us to improve methods for the filtering and 
processing of sequences to identify the subtle signals 
therein. This need is heightened by the advent of 
sequences of entire genomes; these allow qualitatively 
new features to be detected and open new views on the 
evolution of genetic material. The initial progress of this 
emerging science of functional genomics is impressive and 
is documented in this set of reviews. 

Fortunately, one of the first observations to emerge from 
comparative genome analysis is the robustness of genetic 
material that has undergone rearrangement. It may be 
shuffled, horizontally transferred and disrupted, but nev- 
ertheless it often maintains its functionality in different 
organisms. One of the biological themes seems to be 
•modularity*, which shows up in noncoding DNA, as well 
as within the genes, and is also manifest in the three- 
dimensional structures of their products. 

Modularity in DNA is created by duplication events fol- 
lowed by modifications, leading to repetitive segments of 
DNA. Jcr/y Jurka (pp 333-337). reviews the evolution of 
the repetitive transposable elements that comprise a con- 
siderable fraction of the total DNA in eukaryotic genomes. 
Classification and improved detection is essential for 
genome annotation and also for cleaning expressed 
sequence-tag databases. Jurka emphasizes that the previ- 
ous view, that these repeats are merely selfish elements, 
needs to be expanded. Also, whereas most of the current 
applications treat repeats only as 'waste' for the reduction 
of search space, the repeats seem to have diverse roles in 
the genome that can be exploited in a wide range of appli- 



cations, ranging from population studies to mapping and 
genomic engineering. 

Detection and analysis of repeats is also a challenge at the 
protein level. Jaap Heringa (pp 33H-345) reviews the shift 
in focus during the past year from repeats at the protein 
domain level to much shorter fragments that are associat- 
ed with protein malfunction and genetic diseases. At both 
the domain level and the subdomain level, the relation- 
ship between sequence repeats and three-dimensional 
structure remains a puzzle. 

After the detection of repeats, it is crucial to identify the 
genes in the genomes. Christopher Burge and Samuel 
Karlin (pp 346-354) review the recent progress in method 
development, and also point out future directions. The 
problem of finding genes (particularly in cukaryotcs) is far 
from solved. No wonder, because various weak transla- 
tional, transcriptional and splicing signals in the DNA 
have to he identified and combined with experimental 
information, such as from expressed sequence tags and 
trapped exons. 

Identification of genes is essential, but their full value 
comes only w ith their functional and structural annotation, 
losing the first complete prokaryotic genomes, Kugene 
Koonin and colleagues (pp 355-363) discuss important 
aspects of this annotation process, such as the identifica- 
tion of orthologs and the assignment of folds and catalytic 
activities. The power of comparative sequence analysis, 
well known at the level of individual proteins, is now also 
found at the genome level. 

There is still much, however, that is not evident from 
sequence. Genetic mechanisms can cause modifications of 
sequence (such as circular permutations, domain inser- 
tions and secondary-structure rearrangements) that are 
beyond the limits of detection of current sequence analy- 
sis methods. Robert Russell and Christ Pouting 
(pp 364-371 ) summarize cases that can Ik- deciphered only 
by the analysis of protein topology. Their review empha- 
sizes a general point: in many cases, only structural infor- 
mation can illuminate some of the phenomena that 
hamper sequence analysis. 

Structural knowledge can increase the sensitivity of 
sequence searches. Liisa Holm (pp 372-37 l M shows bow 
one can exploit superposition of three-dimensional struc- 
tures for the unification of protein sequence families and 
the detection of remote homolngues. Vet structural simi-" - ] 
larity does not lead to iron-clad functional predictions J 
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This is illustrated by the examples tluir Alexcy Mtir/in 
(pp ,>8(>-.W> presents. These examples also show how a 
wealth of structural data tan he correlated in the light of 
protein evolution. 

The complexity of the course of evolution adds complica- 
tions to genomic analysis. Structural similarity does not 
necessarily mean a common evolutionary origin and 
homologous sequences may evolve into different folds 
(according tu current classification schemes). A single 
function can he found on similar structural scaffolds, so 
there are numerous examples of parallel evolution towards 
a similar functionality, even based on extremely different 
folds. This adds complexity to sequence annotation, as 
most of the current know ledge on sequenced genomes 
(particularly beyond the well characterized yeast and 
b'sthriiihia /////genomes) comes from functional inference 
via homology searches. Thus we can never he sure that a 
detected homologue has exactly the same function in dif- 
ferent genomes. On the other hand, when we hunt for a 
particular function in a genome, it is always possible that 
an unrelated protein has acquired this particular function. 

A first step towards clarifying such problems will he reli- 
able functional annotation that discriminates between in 
vivo, in vitro ami (homology) derived data. Clarification 
also requires, where possible, a structure-based annotation 
of functional features. At the start, we need to ask what 
kind of features can and should be derived and described 
for each sequence. Functional classifications are essential 
if we w ant to describe metabolism and, ultimately, pheno- 
types. Monica Riley (pp .iN<S-.VJ>) summarizes many of 
the problems in function classification, including scman- 
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reach a consistent annotation level, hut will we ever 
achieve annotation that is both reasonably complete and 
computer-readable: Function always depends on the con- 
text and yet only molecular features can he deduced 
directly from sequence. Some information crimes from the 
av ailability of entire genomes: for example, the absence of 
genes and/or functions can be included in predictions. 

Today, what we predict from sequences is at best fragmen- 
tary and qualitative, for example, the presence or absence 
of a certain gene or structure or function or pathw ay. This is 
not enough to describe cellular processes. Fortunately, 
there are experimental fools of growing power for the sup- 
port and extension of genome predictions, such as direct 
measures of gene expression and protein interaction. One 
of the leading techniques is mass spectrometry. Bern hard 
k lister and Matthias Mann (pp .W.V— HMH describe how mass 
spectrometry can be used to sequence and identify proteins 
that have post-translational modifications, even though 
some cannot yet be predicted from sequence. 

Although sequence and structure space is not infinite, we 
will probably never be able to explore them completely 
(consider, for example, the extinction of species with their 
genetic material and the rapid modification of virus 
sequences). With model genomes from evolutionary dis- 
tant species becoming available, how ev er, we can make a 
start at this exploration for humans and other living organ- 
isms. In this endeavor, the methods tor analysis ami anno- 
tation that are being developed today w ill be of the utmost 
importance in future attempts to bridge the genotype and 
phenotype of organisms. 
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